Reformat CSV

Reformatting comma-separated values using | between cells and . between lines can make it more interpretable by Large Language Models.

Parameters

The Reformat CSV transformation has four parameters:

  • Source Column: The column name containing the comma-separated values you want to reformat. Defaults to content.
  • Destination Column: The column name that will hold the reformatted comma-separated values. Defaults to reformatted_csv.
  • Cell Delimiter: The delimiter used to separate cells in the reformatted comma-separated values. Defaults to|.
  • Row Delimiter: The delimiter used to separate rows in the reformatted comma-separated values. Defaults to . .

Usage

To use the Reformat CSV transformation in Mantium, follow these steps:

  1. Configure the Source Column parameter by selecting the column you want to reformat.
  2. Configure the Destination Column parameter by specifying the name of the new column that will be created with the reformatted CSV content.
  3. (Optional) Configure the Cell Delimiter parameter by specifying a delimiter other than the default '|'.
  4. (Optional) Configure the Row Delimiter parameter by specifying a delimiter other than the default '.'.
  5. Run the transformation by clicking the Save and Run Transforms button. The resulting reformatted CSV will be created as a new column in Mantium.

Example 1: Reformatting Comma-separated Values to Tab-separated Values

Suppose we have a dataset with the following comma-separated values (CSV) data in the 'content' column:

ID,Name,Price
1,Laptop,899.99
2,Tablet,499.99
3,Smartphone,699.99

To reformat the CSV data to tab-separated values (TSV), configure the transformation as follows:

  • Source Column: content
  • Target Column: reformatted_content
  • Cell Delimiter: \t (tab character)
  • Row Delimiter: \n (newline character)

The resulting reformatted dataset would look like this:

ID | Name      | Price
1  | Laptop    | 899.99
2  | Tablet    | 499.99
3  | Smartphone| 699.99

Example 2: Reformatting CSV Data with Custom Delimiters

Suppose we have a dataset with the following data using a custom delimiter (pipe | character) in the 'content' column:

order_id|customer_id|product_id|quantity
1001|1|1|2
1002|2|3|1
1003|3|2|4

To reformat the data with custom delimiters, configure the transformation as follows:

  • Source Column: content
  • Target Column: reformatted_content
  • Cell Delimiter: ; (semicolon)
  • Row Delimiter: \n (newline character)

The resulting reformatted dataset would look like this:

order_id;customer_id;product_id;quantity
1001;1;1;2
1002;2;3;1
1003;3;2;4

Example 3: Reformatting Data with Quotes

Suppose we have a dataset with the following data using quotes for values containing commas, stored in the 'content' column:

ID,Name,Address
1,John Doe,"123 Main St, Suite 400"
2,Jane Smith,"456 Elm St, Apartment 20A"

To reformat the data with quotes, configure the transformation as follows:

  • Source Column: content
  • Target Column: reformatted_content
  • Cell Delimiter: \t (tab character)
  • Row Delimiter: \n (newline character)

The resulting reformatted dataset would look like this:

ID | Name      | Address
1  | John Doe  | 123 Main St, Suite 400
2  | Jane Smith| 456 Elm St, Apartment 20A

The Reformat CSV transformation correctly handles values enclosed in quotes, ensuring accurate parsing and reformatting of the dataset.