Reformat CSV
Reformatting comma-separated values using |
between cells and .
between lines can make it more interpretable by Large Language Models.
Parameters
The Reformat CSV transformation has four parameters:
- Source Column: The column name containing the comma-separated values you want to reformat. Defaults to
content
. - Destination Column: The column name that will hold the reformatted comma-separated values. Defaults to
reformatted_csv
. - Cell Delimiter: The delimiter used to separate cells in the reformatted comma-separated values. Defaults to
|
. - Row Delimiter: The delimiter used to separate rows in the reformatted comma-separated values. Defaults to
.
.
Usage
To use the Reformat CSV transformation in Mantium, follow these steps:
- Configure the Source Column parameter by selecting the column you want to reformat.
- Configure the Destination Column parameter by specifying the name of the new column that will be created with the reformatted CSV content.
- (Optional) Configure the Cell Delimiter parameter by specifying a delimiter other than the default '|'.
- (Optional) Configure the Row Delimiter parameter by specifying a delimiter other than the default '.'.
- Run the transformation by clicking the Save and Run Transforms button. The resulting reformatted CSV will be created as a new column in Mantium.
Example 1: Reformatting Comma-separated Values to Tab-separated Values
Suppose we have a dataset with the following comma-separated values (CSV) data in the 'content' column:
ID,Name,Price
1,Laptop,899.99
2,Tablet,499.99
3,Smartphone,699.99
To reformat the CSV data to tab-separated values (TSV), configure the transformation as follows:
- Source Column: content
- Target Column: reformatted_content
- Cell Delimiter: \t (tab character)
- Row Delimiter: \n (newline character)
The resulting reformatted dataset would look like this:
ID | Name | Price
1 | Laptop | 899.99
2 | Tablet | 499.99
3 | Smartphone| 699.99
Example 2: Reformatting CSV Data with Custom Delimiters
Suppose we have a dataset with the following data using a custom delimiter (pipe |
character) in the 'content' column:
order_id|customer_id|product_id|quantity
1001|1|1|2
1002|2|3|1
1003|3|2|4
To reformat the data with custom delimiters, configure the transformation as follows:
- Source Column: content
- Target Column: reformatted_content
- Cell Delimiter: ; (semicolon)
- Row Delimiter: \n (newline character)
The resulting reformatted dataset would look like this:
order_id;customer_id;product_id;quantity
1001;1;1;2
1002;2;3;1
1003;3;2;4
Example 3: Reformatting Data with Quotes
Suppose we have a dataset with the following data using quotes for values containing commas, stored in the 'content' column:
ID,Name,Address
1,John Doe,"123 Main St, Suite 400"
2,Jane Smith,"456 Elm St, Apartment 20A"
To reformat the data with quotes, configure the transformation as follows:
- Source Column: content
- Target Column: reformatted_content
- Cell Delimiter: \t (tab character)
- Row Delimiter: \n (newline character)
The resulting reformatted dataset would look like this:
ID | Name | Address
1 | John Doe | 123 Main St, Suite 400
2 | Jane Smith| 456 Elm St, Apartment 20A
The Reformat CSV transformation correctly handles values enclosed in quotes, ensuring accurate parsing and reformatting of the dataset.
Updated over 1 year ago