Remove Entities

Remove entities in text


  • Source Column: The column name containing the text and entities you want to remove. Defaults to content.
  • Destination Column: The column name that holds the text without entities. Defaults to removed_entities.
  • Fix Unicode: Convert Unicode characters to ASCII equivalents. Defaults to true.
  • To ASCII: Convert non-ASCII characters to ASCII equivalents. Defaults to true.
  • Remove URLs: Replace URLs with empty string. Defaults to false.
  • Remove Emails: Replace emails with empty string. Defaults to false.
  • Remove Phone Numbers: Replace phone numbers with empty string. Defaults to false.
  • Remove Numbers: Replace numbers with empty string. Defaults to false.
  • Remove Digits: Replace digits with empty string. Defaults to false.
  • Remove Currency Symbols: Replace currency symbols with empty string. Defaults to false.
  • Remove Punctuation: Replace punctuation with empty string. Defaults to false.
  • Remove Emojis: Replace emojis with empty string. Defaults to false.
  • Remove Header and Footer: Remove header and footer text. Defaults to false.
  • Remove Substrings: Remove specific substrings from the text. You can add multiple substrings to remove by clicking on the "Add +" button. Defaults to [].
  • Language: Define the language to perform language-dependent text normalization. English ('en') and German ('de') are currently supported. Defaults to en.


To use this Mantium Transform, follow these steps:

  1. Specify the Source Column parameter with the name of the column that contains the text to remove entities from.
  2. Specify the Destination Column parameter with the name of the column that will hold the transformed content with entities removed.
  3. Select the parameters that contains the entities to remove.
  4. Click on Add + to add multiple substrings that will remove specific substrings from the text.
  5. Optionally, define the language to perform language-dependent text normalization. English ('en') and German ('de') are currently supported, and ('en') is the default option.
  6. Run the transformation by clicking the Save and Run Transforms button. The resulting dataset will have a new column with the specified name containing the transformed text.

Example 1: Remove URLs, Emails, and Phone Numbers

Suppose you have a dataset of customer feedback and you want to remove URLs, emails, and phone numbers from the "feedback" column.

1Great product! Check out my website: Contact me at [email protected] or 555-1234.
2I love this! Email me at [email protected] for more info. My website is

Parameters (YAML):

Use the parameters as the configuration in the Mantium app

  name: Remove Entities
    source_column: feedback
    destination_column: removed_entities
    remove_urls: true
    remove_emails: true
    remove_phone_numbers: true

Expected Result Dataset:

1Great product! Check out my website: Contact me at [email protected] or 555-1234.Great product! Check out my website: . Contact me at or .
2I love this! Email me at [email protected] for more info. My website is love this! Email me at for more info. My website is .

Example 2: Remove Numbers, Digits, and Currency Symbols

Suppose you have a dataset of product descriptions and you want to remove numbers, digits, and currency symbols from the "description" column.

1Buy 2 Get 1 Free! Men's Stylish Shoes - $99.99 only!
2Limited Time Offer: Women's Handbag for just $59.95! 50% off!

Parameters (YAML):

Use the parameters as the configuration in the Mantium app

  name: Remove Entities
    source_column: description
    destination_column: removed_entities
    remove_numbers: true
    remove_digits: true
    remove_currency_symbols: true

Expected Result Dataset:

1Buy 2 Get 1 Free! Men's Stylish Shoes - $99.99 only!Buy Get Free! Men's Stylish Shoes - only!
2Limited Time Offer: Women's Handbag for just $59.95! 50% off!Limited Time Offer: Women's Handbag for just ! % off!