Combine Rows

Aggregate data from multiple rows into a single row. This transformation is similar to a SQL group_by operator.

Parameters

The Combine Rows transformation has three required parameters:

  • Row Group Identifiers: The sets of identifier fields used to identify unique row groups. Defaults to [].
    Each identifier requires the following options:
    • type: The type of row group. Examples: Week, Day, Identity.
    • Source Column: The column that will be used to identify the group.
    • Target Column: The column that will store the row group label.
  • Source Column: The column name containing the data you want to combine. Defaults to content.
  • Destination Column: The column name that will hold the combined data. Defaults to combined_data.
  • Combiner Function: Specifies how the data is combined. This is a required field. Currently, the available options are Concatenate or Count.

Usage

To use the Combine Rows transformation in Mantium, follow these steps:

  1. Select the transformation from the list of available transformations in the Mantium user interface.
  2. Click on the 'Add +' button next to the Row Group Identifier parameter to configure a row group identifier.
  3. Select a row group identifier type. For instance, if you want to group rows by month, select Month and select a source column with date data.
  4. Select the source column that you want to use for grouping.
  5. Enter a destination column that will store the row group label.
  6. Repeat steps 2-5 to add additional columns to be used for grouping.
  7. Configure the Source Column parameter by selecting the column that you want to combine.
  8. Configure the Destination Column parameter by specifying the name of the new column that will be created with the combined data.
  9. Select the Combiner Function: Concatenate or Count.
  10. Run the transformation by clicking the Save and Run Transforms button. The resulting dataset will have the specified rows combined.

Example 1: Combining Favorite Superheroes by Year and Name

Use Case

Suppose we have a dataset containing people's favorite superheroes, and we want to combine the rows based on the same year and name, creating a new column called 'Heroes' that shows all the favorite superheroes for each group.

Sample Dataset

NameDateGenderOccupationFavorite Superhero
John2023-02-01MaleEngineerBatman
Jane2022-10-06FemaleDoctorIronman
John2023-05-02MaleEngineerShe-Hulk
Jill2024-08-09FemaleDoctorSuperman

Config as YAML

Row Group Identifiers:
  - type: Year
    source_column: Date
    destination_column: Year
  - type: Identity
    source_column: Name
    target_column: Name
Source Column: Favorite Superhero
Destination Column: Heroes
Combiner Function: Concatenate

Expected Result Dataset

YearNameHeroes
2022JaneIronman
2023JohnBatman; She-Hulk
2024JillSuperman

Example 2: Counting Superheroes by Occupation

Use Case

Suppose we have the same dataset as in Example 1 and we want to count the number of superheroes listed for each occupation.

Sample Dataset

NameDateGenderOccupationFavorite Superhero
John2023-02-01MaleEngineerBatman
Jane2022-10-06FemaleDoctorIronman
John2023-05-02MaleEngineerShe-Hulk
Jill2024-08-09FemaleDoctorSuperman

In Example 2, we are working with the same dataset as in Example 1, which contains information about people's favorite superheroes. The goal of this example is to count the number of superheroes listed for each occupation in the dataset.

The Combine Rows transformation is configured with the following parameters:

  • Row Group Identifiers: We are using the Identity type to group rows by the "Occupation" column. The resulting dataset will have one row for each unique occupation.
  • Source Column: We choose the "Favorite Superhero" column as the source column. This column contains the superhero names we want to count for each occupation.
  • Target Column: We create a new column called "Superhero Count" to store the count of superheroes for each unique occupation.
  • Combiner Function: We select the "Count" combiner function to count the number of superhero entries for each occupation group.

When the transformation is executed, it groups the dataset by the unique occupation values (in this case, "Engineer" and "Doctor") and counts the number of superhero entries in each group. The resulting dataset shows the count of superheroes for each occupation:

Config as YAML

Row Group Identifiers:
  - type: Identity
    source_column: Occupation
    target_column: Occupation
Source Column: Favorite Superhero
Target Column: Superhero Count
Combiner Function: Count
Combine Rows

Expected Result Dataset

OccupationSuperhero Count
Engineer2
Doctor2