- Source Column: The column name containing the Microsoft Word files you want to extract text from. Defaults to
- Destination Column: The column name that holds the extracted text. Defaults to
To use the Convert Microsoft Word (.docx) to Text transformation in Mantium, follow these steps:
- Configure the Source Column parameter by selecting the column that contains the Microsoft Word files to be converted.
- Configure the Destination Column parameter by specifying the name of the new column that will be created with the extracted text data.
- Run the transformation by clicking the Save and Run Transforms button. The resulting dataset will have the specified Microsoft Word files converted to text and stored in the new column.
Suppose we have a dataset with a column called 'Docx File' that contains Microsoft Word file in binary format:
If we want to extract the text data from the Microsoft PowerPoint files and create a new column called 'Text', we can use the Convert Microsoft PowerPoint (.pptx) to Text transformation.
We would configure the transformation as follows (see image below):
Source Column: content Destination Column: text
The resulting dataset would look like this:
DOCX File, Text <Docx binary data>, <Text extracted from Docx file>
Updated 7 months ago