Word Documents (.docx)

Mantium provides support for Word documents (DOCX)

Features

  • Text Extraction: Mantium's Word Document tech allows for the extraction of text from Word documents. This feature makes it easier for users to analyze the text contained in a document.
  • Text Analysis: Mantium's Word Document tech can analyze text in a human-readable way. This feature makes it easier for users to extract specific pieces of text from a document, such as headings or paragraphs.
  • Table Extraction: Mantium's Word Document tech supports the extraction of tables from Word documents. This feature provides users with a comprehensive view of the data contained in a table.
  • Image Extraction: Mantium's Word Document tech supports the extraction of images from Word documents. This feature makes it easier for users to access and analyze images contained in a document.
  • Font Support: Mantium's Word Document tech supports various font types. This feature ensures that users can access and analyze Word documents that use different font types.
  • Style Support: Mantium's Word Document tech supports the extraction and analysis of styles from Word documents. This feature provides users with a comprehensive view of the styles used in a document.
  • Metadata Extraction: Mantium's Word Document tech supports the extraction of metadata from Word documents. This feature makes it easier for users to access and analyze metadata contained in a document.

Limitations

It's important to note that Mantium's Word Document tech has a few limitations:

  • Compatibility: Mantium's Word Document tech only supports the manipulation of .docx files. It does not support .doc files.
  • Image Extraction: While Mantium's Word Document tech supports the extraction of images from Word documents, it may not be able to extract certain image types in some cases. For example, some Word documents may use image formats that are not supported by python-docx.
  • Table Extraction: Mantium's Word Document tech may not be able to extract tables from all Word documents, particularly those with complex table structures.
  • Content Control Extraction: Mantium's Word Document tech does not currently support the extraction of content controls from Word documents.
  • Macro Execution: Mantium's Word Document tech does not support the execution of macros in Word documents.

Usage

To use the Word Documents Data Connector in Mantium, follow these steps:

  1. Click Data Source on the left navigation bar to go to the Data Sources section.
  2. On the top right corner, select Add Data Source.
  3. From the list of Data Sources, select the Word Documents Data Connector.
  4. Provide the necessary details to label the Data Source and wait for the job process to complete.
  5. Complete the data upload process by uploading the Word document (.docx) containing the data you want to analyze (e.g., articles, contracts).
  6. Click the Finish and Sync button to finalize the setup and synchronize the data.