Detect PII

Detect whether Personally Identifiable Information (PII) is present in text by returning True or False. PII is any data that can be used to identify a specific individual, such as their name, address, social security number, or email address. PII can be sensitive and its exposure can lead to privacy breaches, identity theft, and other forms of fraud. By using this Enrichment, organizations can scan their text data for PII and take appropriate measures to safeguard it. This can help prevent data breaches and protect the privacy of individuals. Use the Prompt Template to guide the PII detection. The $field_source_column notation specifies the source column and the $column_<column_name> notation specifies all other columns in your dataset.

Parameters

Listed below are the parameters that can be used with this Mantium Transform:

  • Source Column: The column name containing the text you want to check for PII. Defaults to content.
  • Destination Column: The column name that will hold the PII text. This is a required field. Defaults to PII_detected.
  • LLM Model: The large language model used for PII Detection in text. Defaults to gpt-turbo-3.5.
  • Prompt Template: The template that guides the identification of PII in text. Defaults to You can only respond with the word ""True"" or ""False"", where your answer indicates whether the text in the user's message contains PII. Do not explain your answer, and do not use punctuation. Your task is to identify whether the text extracted from your company files contains sensitive PII information that should not be shared with the broader company. Here are some things to look out for: - An email address that identifies a specific person in either the local-part or the domain - The postal address of a private residence (must include at least a street name) - The postal address of a public place (must include either a street name or business name) - Notes about hiring decisions with mentioned names of candidates. The user will send a document for you to analyze. Document: $field_source_column.
  • Credential ID: The connector from your Mantium account. This is a required field.

Usage

To use the PII Detection transformation, you will need to have a valid API key configured in Mantium for the third-party service (e.g., OpenAI) you want to use. If you don't have one, see the guide here

To use this Mantium Enrichment, follow these steps:

  1. Specify the Source Column parameter with the name of the column that contains the text to detect if PII is present.
  2. Specify the Destination Column parameter with the name of the column that will hold the True or False value returned to indicate if PII is detected.
  3. Specify the LLM Model parameter with the name of the Large Language Model to use to detect for PII.
  4. Specify the Prompt Template parameter if desired.
  5. Specify the Credential ID parameter with the connector from your Mantium account.
  6. Run the transformation by clicking the Save and Run Transforms button. The resulting dataset will have a new column with the specified name containing the PII result.

Example 1: Customer support inquiries

Suppose you have an email inbox for customers to submit feedback and inquiries and you want to flag any containing PII.

emailsubjectbody
[email protected]New orderHi, I am interested in purchasing your new smartwatch. Can you provide me with more information about its features and pricing?
[email protected]Broken itemMy name is John and I recently purchased your new vacuum cleaner, but it stopped working after just one use. I am extremely disappointed and would like a replacement or refund. My address on file is 123 Sesame Street Columbus, OH.
[email protected]Great productI have been using your mobile banking app for a few months now and I think it's fantastic

Parameters (YAML):

Use the parameters as the configuration in the Mantium app

transform:
  name: Detect PII
  parameters:
    source_column: body
    destination_column: PII_detected
    llm_model: gpt-turbo-3.5
    prompt_template: "You can only respond with the word ""True"" or ""False"", where your answer indicates whether the text in the user's message contains PII. Do not explain your answer, and do not use punctuation. Your task is to identify whether the text extracted from your company files contains sensitive PII information that should not be shared with the broader company. Here are some things to look out for: - An email address that identifies a specific person in either the local-part or the domain - The postal address of a private residence (must include at least a street name) - The postal address of a public place (must include either a street name or business name) - Notes about hiring decisions with mentioned names of candidates. The user will send a document for you to analyze. Document: $field_source_column"
    credential_id: your_credential_id_here

Expected Result Dataset:

emailsubjectbodyPII_detected
[email protected]New orderHi, I am interested in purchasing your new smartwatch. Can you provide me with more information about its features and pricing?False
[email protected]Broken itemMy name is John and I recently purchased your new vacuum cleaner, but it stopped working after just one use. I am extremely disappointed and would like a replacement or refund. My address on file is 123 Sesame Street Columbus, OH.True
[email protected]Great productI have been using your mobile banking app for a few months now and I think it's fantasticFalse