Lyric Transcription

Extract lyrics from audio files using AI-powered speech recognition with multi-language support.

Endpoint

POST /api/v1/lyric_transcription/{model}

Parameters

Path Parameters

Name

Type

Required

Description

model

string

Yes

Model to use: standard

Request Body

Field

Type

Required

Description

file

binary

Yes

Audio file to analyze (mp3, wav, flac, m4a, aac, ogg)

Request Example

cURL

curl https://platform.mippia.com/api/v1/lyric_transcription/standard \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -X POST \
  -F "file=@/path/to/audio.mp3"

Python

import requests

url = "https://platform.mippia.com/api/v1/lyric_transcription/standard"
headers = {
    "Authorization": "Bearer YOUR_API_KEY"
}
files = {
    "file": open("/path/to/audio.mp3", "rb")
}

response = requests.post(url, headers=headers, files=files)
print(response.json())

Response (Initial)

{
  "taskId": "string"
}

Callback Response (Success)

{
  "task_id": "task_20251204052920_J8uNdq5z",
  "task_name": "lyric_transcription",
  "status": "success",
  "completed_at": "2025-12-04T05:30:15Z",
  "result_json": {
    "lyrics": "I walk alone through empty streets\nSearching for a love I cannot find\nThe stars above remind me of your eyes\nBut you are gone and left me here behind",
    "languages": [
      {
        "code": "en",
        "probability": 0.9234
      },
      {
        "code": "ko",
        "probability": 0.0521
      },
      {
        "code": "ja",
        "probability": 0.0245
      }
    ],
    "is_instrumental": false
  }
}

Callback Response (Instrumental Track)

When no vocals are detected, the track is classified as instrumental:

{
  "task_id": "task_20251204052920_K9vPdq6a",
  "task_name": "lyric_transcription",
  "status": "success",
  "completed_at": "2025-12-04T05:30:15Z",
  "result_json": {
    "lyrics": "",
    "languages": [
      {
        "code": "Inst",
        "probability": 1.0
      }
    ],
    "is_instrumental": true
  }
}

Callback Response (Failure)

{
  "task_id": "task_20251204052920_J8uNdq5z",
  "task_name": "lyric_transcription",
  "status": "failure",
  "completed_at": "2025-12-04T05:30:15Z",
  "result_json": {},
  "error": "Error message describing what went wrong"
}

Result Fields

Field

Type

Description

task_id

string

Unique task identifier

task_name

string

Always lyric_transcription

status

string

Task status: pending, processing, success, failure

completed_at

string

ISO 8601 completion timestamp

result_json

object

Transcription results

error

string

Error message (only present when status is failed)

Result JSON Fields

Field

Type

Description

lyrics

string

Extracted lyrics text (empty if instrumental)

languages

array

Detected languages with probabilities (top 3)

is_instrumental

boolean

true if no vocals detected

Language Object Fields

Field

Type

Description

code

string

ISO 639-1 language code (or Inst for instrumental)

probability

float

Detection confidence (0.0 - 1.0)

Supported Languages

The transcription system supports 18 languages:

Code

Language

Code

Language

ko

Korean

ar

Arabic

en

English

hi

Hindi

zh

Chinese

th

Thai

ja

Japanese

vi

Vietnamese

es

Spanish

id

Indonesian

fr

French

tr

Turkish

de

German

pl

Polish

it

Italian

nl

Dutch

pt

Portuguese

ru

Russian

Processing Pipeline

  1. Vocal Separation: Audio is processed using Demucs to isolate vocals from instruments

  2. Vocal Detection: Checks for presence of vocals using RMS and zero-crossing rate analysis

  3. Language Detection: Multiple segments are analyzed to detect language distribution

Notes

  • Supported formats: mp3, wav, flac, m4a, aac, ogg

  • Processing time: 1-3 minutes typical (depends on audio length)

  • Instrumental detection: Tracks with no vocals or very short lyrics (<10 characters) are classified as instrumental

  • Multi-language support: The system can detect mixed-language lyrics and reports the top 3 languages by probability