Remove Entities
Remove entities in text
Parameters
- Source Column: The column name containing the text and entities you want to remove. Defaults to
content
. - Destination Column: The column name that holds the text without entities. Defaults to
removed_entities
. - Fix Unicode: Convert Unicode characters to ASCII equivalents. Defaults to
true
. - To ASCII: Convert non-ASCII characters to ASCII equivalents. Defaults to
true
. - Remove URLs: Replace URLs with empty string. Defaults to
false
. - Remove Emails: Replace emails with empty string. Defaults to
false
. - Remove Phone Numbers: Replace phone numbers with empty string. Defaults to
false
. - Remove Numbers: Replace numbers with empty string. Defaults to
false
. - Remove Digits: Replace digits with empty string. Defaults to
false
. - Remove Currency Symbols: Replace currency symbols with empty string. Defaults to
false
. - Remove Punctuation: Replace punctuation with empty string. Defaults to
false
. - Remove Emojis: Replace emojis with empty string. Defaults to
false
. - Remove Header and Footer: Remove header and footer text. Defaults to
false
. - Remove Substrings: Remove specific substrings from the text. You can add multiple substrings to remove by clicking on the "Add +" button. Defaults to
[]
. - Language: Define the language to perform language-dependent text normalization. English ('en') and German ('de') are currently supported. Defaults to en.
Usage
To use this Mantium Transform, follow these steps:
- Specify the Source Column parameter with the name of the column that contains the text to remove entities from.
- Specify the Destination Column parameter with the name of the column that will hold the transformed content with entities removed.
- Select the parameters that contains the entities to remove.
- Click on
Add +
to add multiple substrings that will remove specific substrings from the text. - Optionally, define the language to perform language-dependent text normalization. English ('en') and German ('de') are currently supported, and ('en') is the default option.
- Run the transformation by clicking the Save and Run Transforms button. The resulting dataset will have a new column with the specified name containing the transformed text.
Example 1: Remove URLs, Emails, and Phone Numbers
Suppose you have a dataset of customer feedback and you want to remove URLs, emails, and phone numbers from the "feedback" column.
ID | Feedback |
---|---|
1 | Great product! Check out my website: http://www.example.com/. Contact me at [email protected] or 555-1234. |
2 | I love this! Email me at [email protected] for more info. My website is https://www.jane-shop.com/. |
Parameters (YAML):
Use the parameters as the configuration in the Mantium app
transform:
name: Remove Entities
parameters:
source_column: feedback
destination_column: removed_entities
remove_urls: true
remove_emails: true
remove_phone_numbers: true
Expected Result Dataset:
ID | Feedback | Removed_Entities |
---|---|---|
1 | Great product! Check out my website: http://www.example.com/. Contact me at [email protected] or 555-1234. | Great product! Check out my website: . Contact me at or . |
2 | I love this! Email me at [email protected] for more info. My website is https://www.jane-shop.com/. | I love this! Email me at for more info. My website is . |
Example 2: Remove Numbers, Digits, and Currency Symbols
Suppose you have a dataset of product descriptions and you want to remove numbers, digits, and currency symbols from the "description" column.
ID | Feedback |
---|---|
1 | Buy 2 Get 1 Free! Men's Stylish Shoes - $99.99 only! |
2 | Limited Time Offer: Women's Handbag for just $59.95! 50% off! |
Parameters (YAML):
Use the parameters as the configuration in the Mantium app
transform:
name: Remove Entities
parameters:
source_column: description
destination_column: removed_entities
remove_numbers: true
remove_digits: true
remove_currency_symbols: true
Expected Result Dataset:
ID | Description | Removed_Entities |
---|---|---|
1 | Buy 2 Get 1 Free! Men's Stylish Shoes - $99.99 only! | Buy Get Free! Men's Stylish Shoes - only! |
2 | Limited Time Offer: Women's Handbag for just $59.95! 50% off! | Limited Time Offer: Women's Handbag for just ! % off! |
Updated over 1 year ago