# Lyric Transcription

Extract lyrics from audio files using AI-powered speech recognition with multi-language support.

## Endpoint

```
POST /api/v1/lyric_transcription/{model}
```

## Parameters

### Path Parameters

| Name | Type | Required | Description |
| --- | --- | --- | --- |
| `model` | string | Yes | Model to use: `standard` |
### Request Body

| Field | Type | Required | Description |
| --- | --- | --- | --- |
| `file` | binary | Yes | Audio file to analyze (mp3, wav, flac, m4a, aac, ogg) |

## Request Example

### cURL

```bash
curl https://platform.mippia.com/api/v1/lyric_transcription/standard \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -X POST \
  -F "file=@/path/to/audio.mp3"
```

### Python

```python
import requests

url = "https://platform.mippia.com/api/v1/lyric_transcription/standard"
headers = {
    "Authorization": "Bearer YOUR_API_KEY"
}
files = {
    "file": open("/path/to/audio.mp3", "rb")
}

response = requests.post(url, headers=headers, files=files)
print(response.json())
```

## Response (Initial)
```json
{
  "taskId": "string"
}
```

## Callback Response (Success)

```json
{
  "task_id": "task_20251204052920_J8uNdq5z",
  "task_name": "lyric_transcription",
  "status": "success",
  "completed_at": "2025-12-04T05:30:15Z",
  "result_json": {
    "lyrics": "I walk alone through empty streets\nSearching for a love I cannot find\nThe stars above remind me of your eyes\nBut you are gone and left me here behind",
    "languages": [
      {
        "code": "en",
        "probability": 0.9234
      },
      {
        "code": "ko",
        "probability": 0.0521
      },
      {
        "code": "ja",
        "probability": 0.0245
      }
    ],
    "is_instrumental": false
  }
}
```

## Callback Response (Instrumental Track)

When no vocals are detected, the track is classified as instrumental:

```json
{
  "task_id": "task_20251204052920_K9vPdq6a",
  "task_name": "lyric_transcription",
  "status": "success",
  "completed_at": "2025-12-04T05:30:15Z",
  "result_json": {
    "lyrics": "",
    "languages": [
      {
        "code": "Inst",
        "probability": 1.0
      }
    ],
    "is_instrumental": true
  }
}
```

## Callback Response (Failure)

```json
{
  "task_id": "task_20251204052920_J8uNdq5z",
  "task_name": "lyric_transcription",
  "status": "failure",
  "completed_at": "2025-12-04T05:30:15Z",
  "result_json": {},
  "error": "Error message describing what went wrong"
}
```

## Result Fields

| Field | Type | Description |
| --- | --- | --- |
| `task_id` | string | Unique task identifier |
| `task_name` | string | Always `lyric_transcription` |
| `status` | string | Task status: `pending`, `processing`, `success`, `failure` |
| `completed_at` | string | ISO 8601 completion timestamp |
| `result_json` | object | Transcription results |
| `error` | string | Error message (only present when `status` is `failed`) |

### Result JSON Fields

| Field | Type | Description |
| --- | --- | --- |
| `lyrics` | string | Extracted lyrics text (empty if instrumental) |
| `languages` | array | Detected languages with probabilities (top 3) |
| `is_instrumental` | boolean | `true` if no vocals detected |

### Language Object Fields

| Field | Type | Description |
| --- | --- | --- |
| `code` | string | ISO 639-1 language code (or `Inst` for instrumental) |
| `probability` | float | Detection confidence (0.0 - 1.0) |

## Supported Languages

The transcription system supports 18 languages:

| Code | Language | Code | Language |
| --- | --- | --- | --- |
| `ko` | Korean | `ar` | Arabic |
| `en` | English | `hi` | Hindi |
| `zh` | Chinese | `th` | Thai |
| `ja` | Japanese | `vi` | Vietnamese |
| `es` | Spanish | `id` | Indonesian |
| `fr` | French | `tr` | Turkish |
| `de` | German | `pl` | Polish |
| `it` | Italian | `nl` | Dutch |
| `pt` | Portuguese | `ru` | Russian |

## Processing Pipeline

1. **Vocal Separation**: Audio is processed using Demucs to isolate vocals from instruments
2. **Vocal Detection**: Checks for presence of vocals using RMS and zero-crossing rate analysis
3. **Language Detection**: Multiple segments are analyzed to detect language distribution

## Notes

- **Supported formats**: mp3, wav, flac, m4a, aac, ogg
- **Processing time**: 1-3 minutes typical (depends on audio length)
- **Instrumental detection**: Tracks with no vocals or very short lyrics (<10 characters) are classified as instrumental
- **Multi-language support**: The system can detect mixed-language lyrics and reports the top 3 languages by probability