How to Summarize Large Texts from Audio Transcriptions Using Mantium

In this tutorial, we will learn how to transcribe a podcast (audio file) into text and then use Mantium's platform to summarize the transcription. We will leverage Mantium's audio file data connector, the Transcribe Audio, and the Summarize Text transformations to process the text and save the summarized output.

Mantium makes it easy to load audio data using the connector, preprocess, and perform different transformations with just a few clicks.

Objective

Our goal is to use the Mantium platform to import audio file, use the Transcribe Audio Transform to convert audio to text, preprocess the text and apply the Summarized Text data transformation.

Prerequisites

  • Take a few moments to set up the API key for OpenAI.
  • A podcast audio file (mp3).
    This tutorial uses a podcast audio file (MP3) from Apple Podcast, titled "Surviving ChatGPT with Christian Hubicki" by Software Engineering Daily. You can find the podcast here. Feel free to use an audio file of your choice.
  • Jump to this part of the tutorial, if you have an audio file transcribed already, for example in a Word document.

Import Audio Files

  1. Navigate to the Data Sources section by clicking Data Source on the left navigation bar.
  2. Click Add Data Source, and select the File Data Connector from the list.
  3. Provide the information to label the Data Source and click Save and Test.
  4. On the next page, the sync job will start automatically. If it doesn't, click on Manual Sync at the top right corner to perform the initial sync.
  5. Wait a few moments for the sync to be completed, and navigate to Files to upload your podcast file.
  6. Click on the Finish and Sync button to complete the upload process.

At this point, we have successfully imported the audio file(.mp3), the next step is to create a custom dataset from the import.

Create New Dataset

Datasets serve as the central workspace where you can apply transformations and enrichments to data retrieved from various sources, enabling you to modify and analyze the data without impacting the original information.

To create a new dataset:

  1. After the sync is completed, Click on the Create Custom Datasets button in the Data Source section.
  2. Alternatively, you can create datasets by navigating to the Datasets section on the left pane.
  3. Provide a Dataset name and select where the data comes from (e.g., the Data Connector used earlier).
  4. Click on Save to save your configuration and wait for the job to complete.

Apply Transcribe Audio Transformation

After creating datasets from the podcast file, it's time to apply transformations that will convert the audio file to text.

To do this:

  1. Navigate to Transforms in the Datasets section and select the Transcribe Audio transform.
  2. Configure the transformation with the following parameters:
Source Column: content 
Destination Column: transcription  
LLM Model: Whisper 1 
Prompt Template: "The audio contains a podcast on ChatGPT"  
Language: en
Credential ID: OpenAI

  1. Optionally, Click on "Save and Run Transforms" to complete the process, and generate the transcription.
  2. You can continue to add more transform steps by clicking the Plus sign (+), and not stop at Step 3.

Note that the limitations for the OpenAI Whisper endpoint(transcription model) is a file of 25 MB's. If it is any bigger, Mantium handles the splitting systematically.

Expected dataset

If you stopped at Step 3, here is the expected dataset with the audio transcription.

Import Document with Transcription

If you have a Word document with audio transcription, you can upload it to Mantium following this guide. Then continue to the next section to generate summaries.

Generate Summaries

To generate summaries of the text content, we will use apply the following transformations.

Spilt Text Transformation

Before generating a summary, we need to split the text using the steps below. The Split Text transform will divide the transcription into different rows in the dataset, splitting it by words.

  1. At the bottom of the Transcribe Audio step, click on the Plus sign (+) to add a new transform step.
  2. Select the Split Text transform from the list of transforms.
  3. Enter the configuration parameter as shown below.
Source Column: transcription  
Destination Column: segmented_text
Split By: Word
Split Length: 500 
Split Overlap: 0
Split Respect Sentence Boundary: true

Summarize Text Transformation

Before you Save and Run the transform, you need to add the Summarize Text transform step.

To do this:

  1. Click on the Plus sign (+) under the Split Text transform card to add a new transform.
  2. Select the Summarize Text transform from the list of transforms.
  3. Enter the configuration parameter as shown below.
Source Column: segmented_text  
Destination Column: summary 
LLM Model: Gpt-3.5-turbo  
Prompt Template: "Summarize this podcast transcription: $field_source_column Summary:"  
Credential ID: OpenAI

Notice that the Source Column here is the column containing the text from the "Split Text" transform step. Also, ensure that you have connected your OpenAI credentials to the platform.

At this point, you can see the summary of each section of the split transcription or you can combine all the summary of the segmented text to generate a concise summary by adding another transform step (Combine Rows Transform).

  1. Click on Save and Run Transforms button to complete the process, or Skip to the next step to combine all the rows of each summary.

Combine Rows Transformation

  1. Click on the Plus sign (+) under the Summarize Text transform card to add a new transform.
  2. Select the Combine Rows transform from the list of transforms.
  3. Enter the configuration parameter as shown below.
Row Group Identifiers:
  - type: Identity
    source_column: file_name
    target_column: file_name
Source Column: summary
Destination Column: combined_summary
Combiner Function: Concatenate

  1. Finally, click on Save and Run Transforms button to complete the process.

Expected Dataset

You have the combined text data after the job process is completed. Click on the Export button at the top right corner to export the transformed dataset.

Conclusion

This tutorial has guided you through the process of transcribing a podcast audio file into text and summarizing the content using Mantium's platform. By following these steps, you can now easily transcribe and summarize any podcast or audio file, enhancing your understanding of the content and saving valuable time.