Datamio API is an open-source, self-hostable platform for audio processing. It provides endpoints for splitting audio files by silence detection, batch processing, and uploading datasets to Hugging Face Hub.

Is Datamio API free to use?

Yes, Datamio API is completely free and open source under the MIT license. You can self-host it on your own infrastructure with no usage limits.

How do I self-host Datamio API?

Clone the repository from GitHub, install dependencies with pip install -r requirements.txt, and run python server.py. Docker deployment is also supported with docker-compose up -d.

Open Source & Self-Hostable

Powerful Audio Processing API Platform

Name: Datamio API
Author: Datamio

Split audio files, detect speech segments, and upload datasets to Hugging Face. Open-source, self-hostable, and built for developers who value control.

View on GitHub Explore API Docs Self-Host Guide

100%

Open Source

15+

API Endpoints

1-Click

Self-Host Deploy

MIT

License

Powerful Features

Everything you need to process audio files and manage datasets at scale

🎵

Audio Splitting

Automatically split audio files by silence detection using advanced VAD (Voice Activity Detection) algorithms. Perfect for creating training datasets.

📦

Batch Processing

Process multiple audio files simultaneously with our efficient batch endpoints. Track progress with real-time job status updates.

🤗

Hugging Face Integration

Seamlessly upload your processed audio datasets directly to Hugging Face Hub. Public or private repositories supported.

⚡

Async Job Processing

Long-running tasks are handled asynchronously. Submit jobs and check their status without blocking your application.

🔐

API Key Management

Secure your API with key-based authentication. Generate, revoke, and manage API keys with full admin controls.

📊

Progress Tracking

Monitor job progress in real-time with detailed status updates, segment counts, and comprehensive error reporting.

Self-Host Your Instance

Full control over your data. Deploy anywhere in minutes.

Why Self-Host?

Take complete ownership of your audio processing infrastructure. No vendor lock-in, no usage limits, no data leaving your servers.

✓ Data Privacy - Your audio files never leave your infrastructure
✓ No Rate Limits - Process as many files as your hardware allows
✓ Full Customization - Modify and extend the codebase freely
✓ Cost Effective - Run on your existing infrastructure
✓ MIT Licensed - Use commercially without restrictions

Clone from GitHub

Get the full source code and documentation

→

🐳

Docker Deploy

One-command deployment with Docker Compose

→

Quick Start

# Clone the repository

$ git clone https://github.com/jsbeaudry/datamio-py-api.git

$ cd datamio-py-api

# Install dependencies

$ pip install -r requirements.txt

# Start the server

$ python server.py

# Or use Docker

$ docker-compose up -d

# Server running at http://localhost:8001

API Endpoints

RESTful API design with comprehensive documentation

💚 Health

GET /api/health API health check

🎵 Audio Splitting & Transcription

POST /api/splits/file Split single audio file

POST /api/splits/batch Split multiple audio files

POST /api/splits/file/job Async split with transcription

POST /api/splits/batch/job Async batch with transcription

GET /api/splits/job/{job_id} Get job status

DELETE /api/splits/job/{job_id} Delete split job

🤗 Hugging Face Upload

POST /api/upload-audio-dataset Upload dataset to HF

GET /api/job/{job_id} Get upload job status

DELETE /api/job/{job_id} Delete upload job

🔐 API Key Management

POST /api/keys Generate new API key

GET /api/keys List all API keys

GET /api/keys/{key_id} Get key details

POST /api/keys/{key_id}/revoke Revoke API key

DELETE /api/keys/{key_id} Delete API key

Quick Start Examples

Get started with just a few lines of code

Split a single audio file into speech segments using VAD

import requests

response = requests.post(
    "http://localhost:8001/api/splits/file",
    headers={"X-API-Key": "your-api-key"},
    json={
        "audio_url": "https://example.com/audio.wav",
        "threshold": 0.5,
        "min_speech_duration_ms": 250
    }
)

segments = response.json()["segments"]
for seg in segments:
    print(f"Segment: {seg['start']:.2f}s - {seg['end']:.2f}s")
    print(f"  URL: {seg['url']}")

Process multiple audio files in a single batch request

import requests

audio_files = [
    "https://example.com/audio1.wav",
    "https://example.com/audio2.wav",
    "https://example.com/audio3.wav"
]

response = requests.post(
    "http://localhost:8001/api/splits/batch",
    headers={"X-API-Key": "your-api-key"},
    json={"audio_urls": audio_files}
)

results = response.json()["results"]
for url, data in results.items():
    print(f"{url}: {data['count']} segments")

Upload processed audio dataset to Hugging Face Hub

import requests

dataset = [
    {"audio": "https://example.com/audio1.wav", "text": "Hello world"},
    {"audio": "https://example.com/audio2.wav", "text": "How are you"}
]

response = requests.post(
    "http://localhost:8001/api/upload-audio-dataset",
    headers={"X-API-Key": "your-api-key"},
    json={
        "dataset": dataset,
        "datasetName": "my-audio-dataset",
        "token": "hf_your_token",
        "isPrivate": True
    }
)

job_id = response.json()["job_id"]
print(f"Upload job started: {job_id}")

Split a single audio file using cURL

# Split a single audio file
curl -X POST http://localhost:8001/api/splits/file \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "audio_url": "https://example.com/audio.wav",
    "threshold": 0.5,
    "min_speech_duration_ms": 250,
    "output_format": "wav"
  }'

# Batch process multiple files
curl -X POST http://localhost:8001/api/splits/batch \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{"audio_urls": ["url1.wav", "url2.wav"]}'

Check API health and job status

# Check API health status
curl http://localhost:8001/api/health

# Get split job status
curl -H "X-API-Key: your-api-key" \
  http://localhost:8001/api/splits/job/{job_id}

# Get upload job status
curl -H "X-API-Key: your-api-key" \
  http://localhost:8001/api/job/{job_id}

# List all jobs (admin only)
curl -H "X-API-Key: admin-api-key" \
  http://localhost:8001/api/splits/jobs

Manage API keys (requires admin privileges)

# Create a new API key
curl -X POST http://localhost:8001/api/keys \
  -H "Content-Type: application/json" \
  -H "X-API-Key: admin-api-key" \
  -d '{"name": "my-app", "description": "Production key"}'

# List all API keys
curl -H "X-API-Key: admin-api-key" \
  http://localhost:8001/api/keys

# Revoke an API key
curl -X POST -H "X-API-Key: admin-api-key" \
  http://localhost:8001/api/keys/{key_id}/revoke

# Delete an API key
curl -X DELETE -H "X-API-Key: admin-api-key" \
  http://localhost:8001/api/keys/{key_id}

Split audio file using the Fetch API

const response = await fetch('http://localhost:8001/api/splits/file', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-API-Key': 'your-api-key'
  },
  body: JSON.stringify({
    audio_url: 'https://example.com/audio.wav',
    threshold: 0.5,
    min_speech_duration_ms: 250
  })
});

const { segments } = await response.json();
console.log(`Found ${segments.length} segments`);

segments.forEach((seg, i) => {
  console.log(`  ${i + 1}. ${seg.start}s - ${seg.end}s`);
});

Process multiple audio files with async batch job

async function processBatch(audioUrls) {
  // Create async batch job
  const response = await fetch('http://localhost:8001/api/splits/batch/job', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'X-API-Key': 'your-api-key'
    },
    body: JSON.stringify({ audio_urls: audioUrls })
  });

  const { job_id } = await response.json();
  console.log(`Batch job created: ${job_id}`);
  return job_id;
}

// Usage
const jobId = await processBatch(['url1.wav', 'url2.wav']);

Poll for job completion with status updates

async function waitForJob(jobId, onProgress) {
  while (true) {
    const res = await fetch(
      `http://localhost:8001/api/splits/job/${jobId}`,
      { headers: { 'X-API-Key': 'your-api-key' } }
    );
    const job = await res.json();

    // Report progress
    onProgress?.(job.status, job.message);

    if (job.status === 'completed') return job.result;
    if (job.status === 'failed') throw new Error(job.error);

    await new Promise(r => setTimeout(r, 1000));
  }
}

// Usage
const result = await waitForJob(jobId, (status, msg) => {
  console.log(`[${status}] ${msg}`);
});

Create an async job that returns immediately with a job ID

import requests

# Create async job for single file (returns immediately)
response = requests.post(
    "http://localhost:8001/api/splits/file/job",
    headers={"X-API-Key": "your-api-key"},
    json={
        "audio_url": "https://example.com/long-audio.wav",
        "threshold": 0.5
    }
)

job = response.json()
print(f"Job ID: {job['job_id']}")
print(f"Status: {job['status']}")

# For batch processing, use /api/splits/batch/job instead

Poll job status until completion with progress updates

import requests
import time

job_id = "your-job-id"

# Poll for job completion
while True:
    status = requests.get(
        f"http://localhost:8001/api/splits/job/{job_id}",
        headers={"X-API-Key": "your-api-key"}
    ).json()

    print(f"[{status['status']}] {status['message']}")

    if status["status"] == "completed":
        segments = status["result"]["segments"]
        print(f"Done! Found {len(segments)} segments")
        break
    elif status["status"] == "failed":
        print(f"Error: {status['error']}")
        break

    time.sleep(2)  # Poll every 2 seconds

Complete reusable workflow: create job, poll, and return results

import requests
import time

def process_audio_async(audio_url, api_key, poll_interval=1):
    # Process audio file asynchronously and wait for results

    # Create the job
    job = requests.post(
        "http://localhost:8001/api/splits/file/job",
        headers={"X-API-Key": api_key},
        json={"audio_url": audio_url}
    ).json()

    job_id = job["job_id"]
    print(f"Created job: {job_id}")

    # Poll until completion
    while True:
        status = requests.get(
            f"http://localhost:8001/api/splits/job/{job_id}",
            headers={"X-API-Key": api_key}
        ).json()

        if status["status"] == "completed":
            return status["result"]
        elif status["status"] == "failed":
            raise Exception(status["error"])

        time.sleep(poll_interval)

# Usage
result = process_audio_async("https://example.com/audio.wav", "your-key")
print(f"Got {result['count']} segments")

Ready to Self-Host?

Clone the repository, deploy in minutes, and take full control of your audio processing pipeline.

Get Started on GitHub View API Docs