Transcribe Audio

Transcribe recorded meetings, speeches, or interviews into text, making it easier to reference and search for specific information. This Enrichment can also be used to create closed captions for videos or to transcribe audio content for individuals who are deaf or hard of hearing. By converting audio into text, other Enrichments can perform various tasks on the text such as sentiment analysis, topic modeling, and speech recognition.

Parameters

  • Source Column: The column name containing the audio files you want to transcribe. Defaults to content.
  • Destination Column: The column name that holds the transcription. Defaults to transcription.
  • Transcription Model: The large language model used for audio transcription. Defaults to whisper-1.
  • Prompt: Text to guide the LLM model’s style. The text language should match the audio language. This is an optional field.
  • Language: The language you want the audio to be transcribed to. Defaults to en.
  • Credential ID: The connector from your Mantium account. This is a required field.

Usage

To use the Transcribe Audio transformation, you will need to have a valid API key configured in Mantium for the third-party service (e.g., OpenAI) you want to use. If you don't have one, see the guide here

To use this Mantium Enrichment, follow these steps:

  1. Configure the Source Column parameter by selecting the column containing the audio files you want to transcribe.
  2. Configure the Destination Column parameter by specifying the new name for the column that will hold the transcriptions.
  3. Configure the Transcription Model parameter by selecting the LLM model to use for the transcription.
  4. Optionally input the Prompt Text to guide the LLM model’s style. Ensure that the text language should match the audio language.
  5. Configure the Language parameter by selecting the language you want the audio to be transcribed to.
  6. Configure the Credential ID parameter by selecting the appropriate credential from the list of available credentials in your Mantium account.
  7. Run the transformation by clicking the Save and Run Transforms button. The resulting dataset will have a new column with the specified Destination Column name which will contain the transcribed text for each audio file in the source column.

Example 1: Transcribing Customer Service Calls

Suppose you have a dataset containing audio recordings of customer service calls, and you want to analyze the calls for specific keywords. You can use the Transcribe Audio transformation to convert the audio files to text, and then search for the keywords in the transcriptions.

Sample Dataset:

call_idaudio_file
1call_1_audio.mp3
2call_2_audio.mp3

Parameters:

Source Column: audio_file
Destination Column: transcription
Transcription Model: whisper-1
Language: en    # English
Credential ID: OpenAI

Expected Result Dataset:

call_idaudio_filetranscription
1call_1_audio.mp3"Hello, I have a problem with my recent order..."
2call_2_audio.mp3"Hi, I'd like to return an item I purchased..."

In this example, a new column called "transcription" is created, containing the transcribed text for each audio file in the "audio_file" column. The transcriptions are generated using the Whisper-1 model from OpenAI.

Example 2: Improving Transcription Quality with Prompts

Suppose you want to improve the quality of the transcripts generated by the Whisper API. You can use a prompt to match a specific style, slightly.

Sample Dataset:

call_idaudio_filetranscription
1call_1_audio.mp3“So if I create a new data set, when I name this Antium dataset…”
2call_2_audio.mp3"Hi, I'd like to return an item I purchased..."

Config:

Source Column: audio_file
Destination Column: transcription
Transcription Model: whisper-1
Prompt: The video describes a bug in a Mantium Dataset.
Language: en    # English
Credential ID: OpenAI

Expected Result Dataset:

call_idaudio_filetranscriptiontranscription with prompt
1call_1_audio.mp3“So if I create a new data set, when I name this Antium dataset…”“So if I create a new data set, when I name this Mantium dataset…”
2call_2_audio.mp3"Hi, I'd like to return an item I purchased...""Hi, I'd like to return an item I purchased..."

In this example, a new column called "transcription with prompt" is created, containing the transcribed text for each audio file in the "audio_file" column with modifications from a given prompt. The transcriptions are generated using the Whisper-1 model from OpenAI.