How to Chat with your Data using Mantium
In this tutorial, we'll guide you through the process of connecting your data in Mantium to OpenAI's ChatGPT Plugin. By doing so, you'll be able to interact with your data directly within the ChatGPT interface using the Mantium Plugin. Let's explore the steps to achieve this integration.
Introduction
Using OpenAI plugins, you can access up-to-date information to enhance the capabilities of the Large Language Models. By connecting to these extensions, you can integrate with ChatGPT's core system for your own specific use case. With Mantium Plugin Wizard, you can easily build custom plugins for your own specific use case by leveraging Mantium's data pipeline to bring your own data into ChatGPT.
This document will provide a step-by-step guide on how to achieve it. We will focus on creating a dataset in Mantium, setting up plug-ins, and then chatting with your data.
Objective
Our goal is to use the PDF Data Connector in Mantium's platform to import PDFs from selected ArXiv papers, extract the text, set up plugins, and then query the documents to answer questions, generate content, and provide summarization—all within ChatGPT.
Video
We understand that sometimes it's easier to learn by watching rather than reading. If you prefer a more visual explanation, feel free to check out our accompanying video tutorial below. If you prefer reading or are unable to watch the video, please continue with the text documentation.
Prerequisites
Take a few moments to set up the API key for OpenAI.
Download Dataset
To follow along with this tutorial, download the following papers from https://arxiv.org/ .
Import Data from ArXiv Using the PDF Data Connector
- Navigate to the Data Sources section by clicking
Data Source
on the left navigation bar. - Click
Add Data Source
, and select the PDF Data Connector from the Data Sources list. - Provide the information to label the Data Source, and click Save and Test.
- On the next page, the sync job will start automatically. If it doesn't, click on “Manual Sync” at the top right corner, to perform the initial sync.
- Wait a few moments for the sync to be completed, and navigate to
Files
to upload your papers in.pdf
format. - Click on the
Finish and Sync
button to complete the upload process. - At this point, we have successfully imported PDFs of ArXiv papers using the PDF Data Connector.
Create New Dataset
Datasets serve as the central workspace where you can apply transformations and enrichments to data retrieved from various sources, enabling you to modify and analyze the data without impacting the original information.
To create a new dataset:
- After the sync is completed, Click on the
Create Custom Datasets
button in the Data Source section. - Alternatively, you can create datasets by navigating to the Datasets section on the left pane.
- Provide a Dataset name, and select where the data comes from (PDF Data Connector).
- Click on Save to save your configuration, and wait for the job to complete.
See an example of the Arxiv datasets below. Notice that you have a column with the text
element of the PDF files. (The Convert PDF to Text Transform worked automatically).

Setup Plugins
Mantium setup
Quick Warning
- If you select the Standard option and have previously created a split_content column, ensure to pick this same split_content column in subsequent steps rather than the original text column. This will prevent the unnecessary expansion of your dataset, ultimately keeping your OpenAI usage costs in check.
- To prepare your dataset, Select the Advanced option if you have Embeddings already or you have completed the steps above.
Now that you have your PDFs in Mantium. The next step is to set up a Mantium Plugin
for your use case in ChatGPT in a few steps.
- On the Mantium platform, navigate to the App section at the left navigation pane.
- At the top right corner, click on
New App
- Now, you have the option to select an
Existing Dataset
or select aNew or Existing Data Source
. For this example, since we already have our PDF dataset, we will choose theExisting Dataset
from the list of options. - Next, you have the option to create a new OpenAI credential or select existing credentials. If you have one, select the credential.
- Next, select the
Standard
option for Safe Defaults. - After this, choose the column where you have the text data. In this example, the
text
column. Click onNext
and wait for the job process to complete.
Resulting Datasets
Notice that Mantium has automatically created a couple more columns for you. The embeddings
column is what we are going to ship to our Managed Vector Database (Redis)
or the User Managed Database (Pinecone)
.

- You don't require any setup to use the
Managed Vector Database (Redis)
so you can continue with the setup by selecting this option. Alternatively, you can follow this guide to set up theUser Managed Database (Pinecone)
. - Almost there! Now all you need is to provide the details of your Plug-in, exactly as you would want it to look in the ChatGPT Interface. (See image below)

- Create your Plug-in connection details by clicking the
Create
button.
OpenAI Setup.
Now that you have your Plug-in credentials, the following steps will guide you on how to set it up in the ChatGPT interface.
Note that you must have access to OpenAI Plugin to complete this stage. If you don't have access, join the OpenAI waitlist here
- Navigate to your ChatGPT account - https://chat.openai.com/
- In a blank chat, click on the
Plugins
dropdown, and selectPlugin store
. - On the bottom right corner of the
Plugin Store
page, selectDevelop your own plugin
- From the Mantium setup, copy and paste the
Plug-In URL
into theEnter your website domain
field. - Now, click on the
Find manifest file
button. Wait for OpenAI to validate the manifest file, then click on Next. - Complete the installation process, and enter your access token (from previous the Mantium setup) to install the plugin.
Congratulations! You have completed the setup process!

Chat with your PDFs
Now, let's start using our newly created plugin. Feel free to copy the prompt examples below.
Prompt 1 - Ask questions
Using the PDFPlugin plugin. What is the role of Role of Data-Augmentation in A
Cook’s Guide to Successful SSL Training and Deployment ?
Result in ChatGPT

Prompt 2 - Generate summaries
Using the PDFPlugin plugin, summarize the content on the Publicly Available Model
Checkpoints or APIs in the Survey of Large Language Models.
Result in ChatGPT

Prompt 3 - Generate YouTube scripts using your own document as a reference.
Using the PDFPlugin plugin, summarize the content on the Publicly Available Model Checkpoints or APIs
in the Survey of Large Language Models in a successful Youtuber's script style.
Highlight the important technical information for the technical audience
Result in ChatGPT
Notice the highlighted text in the image; it confirms that the script was generated using the document "Survey of Large Language Models" as a reference.
Conclusion
In this guide, we successfully tackled the challenge of accessing and querying information from academic papers using Large Language Models (LLMs). By integrating OpenAI plugins with the Mantium plugin wizard, we created a seamless workflow to import PDFs, extract text, and interact with the data using ChatGPT. As a result, we unlocked the ability to answer questions, generate summaries, and create content directly from the papers, providing valuable insights and enhancing our work with AI-powered capabilities.
Updated 4 days ago