Lyric Transcription¶

Extract lyrics from audio files using AI-powered speech recognition with multi-language support.

Endpoint¶

POST /api/v1/lyric_transcription/{model}

Parameters¶

Path Parameters¶

Name	Type	Required	Description
`model`	string	Yes	Model to use: `standard`

Request Body¶

Field	Type	Required	Description
`file`	binary	Yes	Audio file to analyze (mp3, wav, flac, m4a, aac, ogg)

Request Example¶

cURL¶

curl https://platform.mippia.com/api/v1/lyric_transcription/standard \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -X POST \
  -F "file=@/path/to/audio.mp3"

Python¶

import requests

url = "https://platform.mippia.com/api/v1/lyric_transcription/standard"
headers = {
    "Authorization": "Bearer YOUR_API_KEY"
}
files = {
    "file": open("/path/to/audio.mp3", "rb")
}

response = requests.post(url, headers=headers, files=files)
print(response.json())

Response (Initial)¶

{
  "taskId": "string"
}

Callback Response (Success)¶

{
  "task_id": "task_20251204052920_J8uNdq5z",
  "task_name": "lyric_transcription",
  "status": "success",
  "completed_at": "2025-12-04T05:30:15Z",
  "result_json": {
    "lyrics": "I walk alone through empty streets\nSearching for a love I cannot find\nThe stars above remind me of your eyes\nBut you are gone and left me here behind",
    "languages": [
      {
        "code": "en",
        "probability": 0.9234
      },
      {
        "code": "ko",
        "probability": 0.0521
      },
      {
        "code": "ja",
        "probability": 0.0245
      }
    ],
    "is_instrumental": false
  }
}

Callback Response (Instrumental Track)¶

When no vocals are detected, the track is classified as instrumental:

{
  "task_id": "task_20251204052920_K9vPdq6a",
  "task_name": "lyric_transcription",
  "status": "success",
  "completed_at": "2025-12-04T05:30:15Z",
  "result_json": {
    "lyrics": "",
    "languages": [
      {
        "code": "Inst",
        "probability": 1.0
      }
    ],
    "is_instrumental": true
  }
}

Callback Response (Failure)¶

{
  "task_id": "task_20251204052920_J8uNdq5z",
  "task_name": "lyric_transcription",
  "status": "failure",
  "completed_at": "2025-12-04T05:30:15Z",
  "result_json": {},
  "error": "Error message describing what went wrong"
}

Result Fields¶

Field	Type	Description
`task_id`	string	Unique task identifier
`task_name`	string	Always `lyric_transcription`
`status`	string	Task status: `pending`, `processing`, `success`, `failure`
`completed_at`	string	ISO 8601 completion timestamp
`result_json`	object	Transcription results
`error`	string	Error message (only present when `status` is `failed`)

Result JSON Fields¶

Field	Type	Description
`lyrics`	string	Extracted lyrics text (empty if instrumental)
`languages`	array	Detected languages with probabilities (top 3)
`is_instrumental`	boolean	`true` if no vocals detected

Language Object Fields¶

Field	Type	Description
`code`	string	ISO 639-1 language code (or `Inst` for instrumental)
`probability`	float	Detection confidence (0.0 - 1.0)

Supported Languages¶

The transcription system supports 18 languages:

Code	Language	Code	Language
`ko`	Korean	`ar`	Arabic
`en`	English	`hi`	Hindi
`zh`	Chinese	`th`	Thai
`ja`	Japanese	`vi`	Vietnamese
`es`	Spanish	`id`	Indonesian
`fr`	French	`tr`	Turkish
`de`	German	`pl`	Polish
`it`	Italian	`nl`	Dutch
`pt`	Portuguese	`ru`	Russian

Processing Pipeline¶

Vocal Separation: Audio is processed using Demucs to isolate vocals from instruments
Vocal Detection: Checks for presence of vocals using RMS and zero-crossing rate analysis
Language Detection: Multiple segments are analyzed to detect language distribution

Notes¶

Supported formats: mp3, wav, flac, m4a, aac, ogg
Processing time: 1-3 minutes typical (depends on audio length)
Instrumental detection: Tracks with no vocals or very short lyrics (<10 characters) are classified as instrumental
Multi-language support: The system can detect mixed-language lyrics and reports the top 3 languages by probability