How to Transform Datasets and Ship to Pinecone with Mantium

Learn to import data from a data source, generate embeddings, and export the transformed data to Pinecone using Mantium.

Introduction

In this tutorial, you will learn how to import data, generate embeddings for the text, and export them to Pinecone for further analysis or machine learning purposes using Mantium.

Objective

Our goal is to show you how to import data, generate embeddings for the text, and export them to Pinecone for further analysis or machine learning purposes.

Video

We understand that sometimes it's easier to learn by watching rather than reading. If you prefer a more visual explanation, feel free to check out our accompanying video tutorial below. If you prefer reading or are unable to watch the video, please continue with the text documentation.

Prerequisites

Upload the File to Mantium

Upload a simple text file to Mantium using the File Data Source:

  1. Navigate to the Data Sources section by clicking "Data Source" on the left pane.
  2. On the top right corner, click on "Add Data Source" and select "File" as the data source.
  3. Provide the Data Source name, and set a sync frequency (12 hours is okay).
  4. On the following screen, click on the "Files" tab to upload the text file.
  5. Wait for the job to complete, and you will have your file uploaded to the Data Source that you created earlier.

Create a New Dataset

Datasets serve as the central workspace where you can apply transformations and enrichments to data retrieved from various sources, enabling you to modify and analyze the data without impacting the original information.

Here we are going to create a new dataset for the purpose of applying the embeddings transformation without changing the original text file.

A simple way to create a new dataset is by using the Create Custom Datasets button in the Data Source section.

Another approach is;

  1. Navigate to the Datasets section by clicking Datasets on the left pane.
  2. On the top right corner, click on New Datasets
  3. Provide a name, and select where the data comes from (Data Source in Step 1).

After you saved the form, you will be presented with a new interface where you can perform transformations, add destinations, and monitor logs.

Apply Embeddings Transformation

Mantium offers the ability to send the data to vector databases like Pinecone for efficient storage and querying. In this step, we will apply the embedding transformation to create a new column with embeddings that can be shipped to Pinecone for further usage.

  1. In the custom datasets that you created above, click on the Transforms tab.
  2. You can view the datasets column using the Dataset preview pane, and perform the transformations under the pane.
  3. On the right side, select the Generate Embeddings transform, and it will show in the Transform Builder section.
  4. Provide the Generate Embeddings transform parameters, as described in this guide
  5. Click the Save and Run Transforms button to confirm the transformation process.
  6. When the job is done, you will see the embedding column created with the embeddings.

Ship Embeddings to Pinecone

Now that we have the embeddings of the text data. The next thing is to export it to Pinecone for efficient storage and querying.

  1. The first thing to do is to set up your Pinecone account, and connect it with your Mantium account.
  2. Navigate to Destination in the Dataset section to add a destination. (see image below)
  3. Provide your Pinecone details in the form, and click the Save button to ship the embeddings to Pinecone.
  1. In your Pinecone account, check your Index to confirm that the embeddings is now in Pinecone.

Conclusion

Congratulations! You have now successfully transformed datasets and shipped the embeddings to Pinecone using Mantium. You can now use Pinecone for efficient storage and querying of your data. Feel free to explore the capabilities of Mantium and Pinecone further to enhance your skills and understanding.