Coalesce Columns
Coalesce one or more columns into a Destination Column. This Transformation iterates through a dataset to fill out a new column. For each row in the Source Columns, the first cell value found that is not None will be returned to the new Destination Column. If all cell values are None, the Transformation will return None to the Destination Column. Note that the order of the Source Columns in your dataset matters for this Transformation. This can be particularly useful in data analysis and reporting, where missing data can skew results or make it difficult to draw accurate conclusions.
Parameters
- Source Columns: The column names to be coalesced. Defaults to
[]
. - Destination Column: The column name that holds the coalesced content. Defaults to
coalesced
.
Usage
To use the Coalesce Columns transformation, you will need to follow these steps:
- Specify the Source Column parameter with the names of the columns to be coalesced, in the order in which they should be checked.
- Define the Destination Column parameter with the name of the new column that will hold the coalesced content.
- Run the transformation by clicking the Save and Run Transforms button.
Example 1: Coalescing Contact Information
Suppose you have a dataset of customer contact information with separate columns for home phone, work phone, and mobile phone numbers. You want to create a new column called "primary_phone" that contains the first available phone number for each customer.
ID | Name | Home_Phone | Work_Phone | Mobile_Phone |
---|---|---|---|---|
1 | John | 555-123-4567 | None | None |
2 | Jane | None | 555-987-6543 | 555-987-0001 |
Parameters (YAML):
transform:
name: Coalesce Columns
parameters:
source_columns:
- Home_Phone
- Work_Phone
- Mobile_Phone
destination_column: primary_phone
Expected Result Dataset:
ID | Name | Home_Phone | Work_Phone | Mobile_Phone | Primary_Phone |
---|---|---|---|---|---|
1 | John | 555-123-4567 | None | None | 555-123-4567 |
2 | Jane | None | 555-987-6543 | 555-987-0001 | 555-987-6543 |
Example 2: Coalescing Preferred Contact Method
Suppose you have a dataset of customer contact information with separate columns for email, phone, and postal address. You want to create a new column called "preferred_contact" that contains the first available contact method for each customer.
ID | Name | Phone | Postal_Address | |
---|---|---|---|---|
1 | John | [email protected] | None | 123 Main St, Anytown |
2 | Jane | None | 555-987-6543 | 456 Elm St, Sometown |
Parameters:
transform:
name: Coalesce Columns
parameters:
source_columns:
- Email
- Phone
- Postal_Address
destination_column: preferred_contact
ID | Name | Phone | Postal_Address | Preferred_Contact | |
---|---|---|---|---|---|
1 | John | [email protected] | None | 123 Main St, Anytown | [email protected] |
2 | Jane | None | 555-987-6543 | 456 Elm St, Sometown | 555-987-6543 |
Updated over 1 year ago