Lyric Transcription¶
Extract lyrics from audio files using AI-powered speech recognition with multi-language support.
Endpoint¶
POST /api/v1/lyric_transcription/{model}
Parameters¶
Path Parameters¶
Name |
Type |
Required |
Description |
|---|---|---|---|
|
string |
Yes |
Model to use: |
Request Body¶
Field |
Type |
Required |
Description |
|---|---|---|---|
|
binary |
Yes |
Audio file to analyze (mp3, wav, flac, m4a, aac, ogg) |
Request Example¶
cURL¶
curl https://platform.mippia.com/api/v1/lyric_transcription/standard \
-H "Authorization: Bearer YOUR_API_KEY" \
-X POST \
-F "file=@/path/to/audio.mp3"
Python¶
import requests
url = "https://platform.mippia.com/api/v1/lyric_transcription/standard"
headers = {
"Authorization": "Bearer YOUR_API_KEY"
}
files = {
"file": open("/path/to/audio.mp3", "rb")
}
response = requests.post(url, headers=headers, files=files)
print(response.json())
Response (Initial)¶
{
"taskId": "string"
}
Callback Response (Success)¶
{
"task_id": "task_20251204052920_J8uNdq5z",
"task_name": "lyric_transcription",
"status": "success",
"completed_at": "2025-12-04T05:30:15Z",
"result_json": {
"lyrics": "I walk alone through empty streets\nSearching for a love I cannot find\nThe stars above remind me of your eyes\nBut you are gone and left me here behind",
"languages": [
{
"code": "en",
"probability": 0.9234
},
{
"code": "ko",
"probability": 0.0521
},
{
"code": "ja",
"probability": 0.0245
}
],
"is_instrumental": false
}
}
Callback Response (Instrumental Track)¶
When no vocals are detected, the track is classified as instrumental:
{
"task_id": "task_20251204052920_K9vPdq6a",
"task_name": "lyric_transcription",
"status": "success",
"completed_at": "2025-12-04T05:30:15Z",
"result_json": {
"lyrics": "",
"languages": [
{
"code": "Inst",
"probability": 1.0
}
],
"is_instrumental": true
}
}
Callback Response (Failure)¶
{
"task_id": "task_20251204052920_J8uNdq5z",
"task_name": "lyric_transcription",
"status": "failure",
"completed_at": "2025-12-04T05:30:15Z",
"result_json": {},
"error": "Error message describing what went wrong"
}
Result Fields¶
Field |
Type |
Description |
|---|---|---|
|
string |
Unique task identifier |
|
string |
Always |
|
string |
Task status: |
|
string |
ISO 8601 completion timestamp |
|
object |
Transcription results |
|
string |
Error message (only present when |
Result JSON Fields¶
Field |
Type |
Description |
|---|---|---|
|
string |
Extracted lyrics text (empty if instrumental) |
|
array |
Detected languages with probabilities (top 3) |
|
boolean |
|
Language Object Fields¶
Field |
Type |
Description |
|---|---|---|
|
string |
ISO 639-1 language code (or |
|
float |
Detection confidence (0.0 - 1.0) |
Supported Languages¶
The transcription system supports 18 languages:
Code |
Language |
Code |
Language |
|---|---|---|---|
|
Korean |
|
Arabic |
|
English |
|
Hindi |
|
Chinese |
|
Thai |
|
Japanese |
|
Vietnamese |
|
Spanish |
|
Indonesian |
|
French |
|
Turkish |
|
German |
|
Polish |
|
Italian |
|
Dutch |
|
Portuguese |
|
Russian |
Processing Pipeline¶
Vocal Separation: Audio is processed using Demucs to isolate vocals from instruments
Vocal Detection: Checks for presence of vocals using RMS and zero-crossing rate analysis
Language Detection: Multiple segments are analyzed to detect language distribution
Notes¶
Supported formats: mp3, wav, flac, m4a, aac, ogg
Processing time: 1-3 minutes typical (depends on audio length)
Instrumental detection: Tracks with no vocals or very short lyrics (<10 characters) are classified as instrumental
Multi-language support: The system can detect mixed-language lyrics and reports the top 3 languages by probability