How to Transform Datasets and Ship to Pinecone with Mantium
Learn to import data from a data source, generate embeddings, and export the transformed data to Pinecone using Mantium.
In this tutorial, you will learn how to import data, generate embeddings for the text, and export them to Pinecone for further analysis or machine learning purposes using Mantium.
Our goal is to show you how to import data, generate embeddings for the text, and export them to Pinecone for further analysis or machine learning purposes.
We understand that sometimes it's easier to learn by watching rather than reading. If you prefer a more visual explanation, feel free to check out our accompanying video tutorial below. If you prefer reading or are unable to watch the video, please continue with the text documentation.
- API Keys for OpenAI, Pinecone
- Created a Pinecone Index.
Upload the File to Mantium
Upload a simple text file to Mantium using the File Data Source:
- Navigate to the Data Sources section by clicking "Data Source" on the left pane.
- On the top right corner, click on "Add Data Source" and select "File" as the data source.
- Provide the Data Source name, and set a sync frequency (12 hours is okay).
- On the following screen, click on the "Files" tab to upload the text file.
- Wait for the job to complete, and you will have your file uploaded to the Data Source that you created earlier.
Create a New Dataset
Datasets serve as the central workspace where you can apply transformations and enrichments to data retrieved from various sources, enabling you to modify and analyze the data without impacting the original information.
Here we are going to create a new dataset for the purpose of applying the embeddings transformation without changing the original text file.
A simple way to create a new dataset is by using the
Create Custom Datasets button in the
Data Source section.
Another approach is;
- Navigate to the Datasets section by clicking
Datasetson the left pane.
- On the top right corner, click on
- Provide a name, and select where the data comes from (Data Source in Step 1).
After you saved the form, you will be presented with a new interface where you can perform transformations, add destinations, and monitor logs.
Apply Embeddings Transformation
Mantium offers the ability to send the data to vector databases like Pinecone for efficient storage and querying. In this step, we will apply the embedding transformation to create a new column with embeddings that can be shipped to Pinecone for further usage.
- In the custom datasets that you created above, click on the
- You can view the datasets column using the
Dataset previewpane, and perform the transformations under the pane.
- On the right side, select the
Generate Embeddingstransform, and it will show in the
- Provide the
Generate Embeddingstransform parameters, as described in this guide
- Click the
Save and Run Transformsbutton to confirm the transformation process.
- When the job is done, you will see the
embeddingcolumn created with the embeddings.
Ship Embeddings to Pinecone
Now that we have the embeddings of the text data. The next thing is to export it to Pinecone for efficient storage and querying.
- The first thing to do is to set up your Pinecone account, and connect it with your Mantium account.
- Navigate to Destination in the Dataset section to add a destination. (see image below)
- Provide your Pinecone details in the form, and click the
Save buttonto ship the embeddings to Pinecone.
- In your Pinecone account, check your Index to confirm that the embeddings is now in Pinecone.
Congratulations! You have now successfully transformed datasets and shipped the embeddings to Pinecone using Mantium. You can now use Pinecone for efficient storage and querying of your data. Feel free to explore the capabilities of Mantium and Pinecone further to enhance your skills and understanding.
Updated 22 days ago