Convert Microsoft PowerPoint to Text

Convert Microsoft PowerPoint files (.pptx) to text which can be used for various purposes such as indexing, searching, or analyzing the content of the document. This enrichment is useful for applications that require the text content of a Microsoft PowerPoint file without the need for the original layout or formatting.

Parameters

  • Source Column: The column name containing the Microsoft PowerPoint files you want to extract text from. Defaults to content.
  • Destination Column: The column name that holds the extracted text. Defaults to text.

Usage

To use the Convert Microsoft PowerPoint (.pptx) to Text transformation in Mantium, follow these steps:

  1. Configure the Source Column parameter by selecting the column that contains the Microsoft PowerPoint files to be converted.
  2. Configure the Destination Column parameter by specifying the name of the new column that will be created with the extracted text data.
  3. Run the transformation by clicking the Save and Run Transforms button. The resulting dataset will have the specified Microsoft PowerPoint files converted to text and stored in the new column.

Example 1: Extracting Text Data from Microsoft PowerPoint Files

Suppose we have a dataset with a column called 'PPTX File' that contains Microsoft PowerPoint files in binary format:

<binary application/vnd.openxmlformats-officedocument.presentationml.presentation>

If we want to extract the text data from the Microsoft PowerPoint files and create a new column called 'Text Data', we can use the Convert Microsoft PowerPoint (.pptx) to Text transformation.

We would configure the transformation as follows (see image below):

Source Column: content  
Destination Column: text
Transformation to Convert PPTX file to Text

The resulting dataset would look like this:

PPTX File, Text Data
<PPTX binary data>, <Text extracted from PPTX file>