How to Chat with your Data using Mantium

In this tutorial, we'll guide you through the process of connecting your data in Mantium to OpenAI's ChatGPT Plugin. By doing so, you'll be able to interact with your data directly within the ChatGPT interface using the Mantium Plugin. Let's explore the steps to achieve this integration.

Introduction

Using OpenAI plugins, you can access up-to-date information to enhance the capabilities of the Large Language Models. By connecting to these extensions, you can integrate with ChatGPT's core system for your own specific use case. With Mantium Plugin Wizard, you can easily build custom plugins for your own specific use case by leveraging Mantium's data pipeline to bring your own data into ChatGPT.

This document will provide a step-by-step guide on how to achieve it. We will focus on creating a dataset in Mantium, setting up plug-ins, and then chatting with your data.

Objective

Our goal is to use the PDF Data Connector in Mantium's platform to import PDFs from selected ArXiv papers, extract the text, set up plugins, and then query the documents to answer questions, generate content, and provide summarization—all within ChatGPT.

Video

We understand that sometimes it's easier to learn by watching rather than reading. If you prefer a more visual explanation, feel free to check out our accompanying video tutorial below. If you prefer reading or are unable to watch the video, please continue with the text documentation.

Prerequisites

Take a few moments to set up the API key for OpenAI.

Download Dataset

To follow along with this tutorial, download the following papers from https://arxiv.org/ .

Import Data from ArXiv Using the PDF Data Connector

  1. Navigate to the Data Sources section by clicking Data Source on the left navigation bar.
  2. Click Add Data Source, and select the PDF Data Connector from the Data Sources list.
  3. Provide the information to label the Data Source, and click Save and Test.
  4. On the next page, the sync job will start automatically. If it doesn't, click on “Manual Sync” at the top right corner, to perform the initial sync.
  5. Wait a few moments for the sync to be completed, and navigate to Files to upload your papers in .pdf format.
  6. Click on the Finish and Sync button to complete the upload process.
  7. At this point, we have successfully imported PDFs of ArXiv papers using the PDF Data Connector.

Create New Dataset

Datasets serve as the central workspace where you can apply transformations and enrichments to data retrieved from various sources, enabling you to modify and analyze the data without impacting the original information.

To create a new dataset:

  1. After the sync is completed, Click on the Create Custom Datasets button in the Data Source section.
  2. Alternatively, you can create datasets by navigating to the Datasets section on the left pane.
  3. Provide a Dataset name, and select where the data comes from (PDF Data Connector).
  4. Click on Save to save your configuration, and wait for the job to complete.

See an example of the Arxiv datasets below. Notice that you have a column with the text element of the PDF files. (The Convert PDF to Text Transform worked automatically).

Setup Plugins

Mantium setup

🚧

Quick Warning

  • If you select the Standard option and have previously created a split_content column, ensure to pick this same split_content column in subsequent steps rather than the original text column. This will prevent the unnecessary expansion of your dataset, ultimately keeping your OpenAI usage costs in check.
  • To prepare your dataset, Select the Advanced option if you have Embeddings already or you have completed the steps above.

Now that you have your PDFs in Mantium. The next step is to set up a Mantium Plugin for your use case in ChatGPT in a few steps.

  1. On the Mantium platform, navigate to the App section at the left navigation pane.
  2. At the top right corner, click on New App
  3. Now, you have the option to select an Existing Dataset or select a New or Existing Data Source. For this example, since we already have our PDF dataset, we will choose the Existing Dataset from the list of options.
  4. Next, you have the option to create a new OpenAI credential or select existing credentials. If you have one, select the credential.
  5. Next, select the Standard option for Safe Defaults.
  6. After this, choose the column where you have the text data. In this example, the text column. Click on Next and wait for the job process to complete.

Resulting Datasets
Notice that Mantium has automatically created a couple more columns for you. The embeddings column is what we are going to ship to our Managed Vector Database (Redis) or the User Managed Database (Pinecone).

  1. You don't require any setup to use the Managed Vector Database (Redis) so you can continue with the setup by selecting this option. Alternatively, you can follow this guide to set up the User Managed Database (Pinecone).
  2. Almost there! Now all you need is to provide the details of your Plug-in, exactly as you would want it to look in the ChatGPT Interface. (See image below)
ChatGPT Plugin
  1. Create your Plug-in connection details by clicking the Create button.

OpenAI Setup.

Now that you have your Plug-in credentials, the following steps will guide you on how to set it up in the ChatGPT interface.

Note that you must have access to OpenAI Plugin to complete this stage. If you don't have access, join the OpenAI waitlist here

  1. Navigate to your ChatGPT account - https://chat.openai.com/
  2. In a blank chat, click on the Plugins dropdown, and select Plugin store.
  3. On the bottom right corner of the Plugin Store page, select Develop your own plugin
  4. From the Mantium setup, copy and paste the Plug-In URL into the Enter your website domain field.
  5. Now, click on the Find manifest file button. Wait for OpenAI to validate the manifest file, then click on Next.
  6. Complete the installation process, and enter your access token (from previous the Mantium setup) to install the plugin.

Congratulations! You have completed the setup process!

Chat with your PDFs

Now, let's start using our newly created plugin. Feel free to copy the prompt examples below.

Prompt 1 - Ask questions

Using the PDFPlugin plugin. What is the role of Role of Data-Augmentation in A 
Cook’s Guide to Successful SSL Training and Deployment ?

Result in ChatGPT

Prompt 2 - Generate summaries

Using the PDFPlugin plugin, summarize the content on the Publicly Available Model
Checkpoints or APIs in the Survey of Large Language Models. 

Result in ChatGPT

Prompt Example 2

Prompt 3 - Generate YouTube scripts using your own document as a reference.

Using the PDFPlugin plugin,  summarize the content on the Publicly Available Model Checkpoints or APIs 
in the Survey of Large Language Models in a successful Youtuber's script style. 
Highlight the important technical information for the technical audience

Result in ChatGPT

Notice the highlighted text in the image; it confirms that the script was generated using the document "Survey of Large Language Models" as a reference.

Prompt example 3

Conclusion

In this guide, we successfully tackled the challenge of accessing and querying information from academic papers using Large Language Models (LLMs). By integrating OpenAI plugins with the Mantium plugin wizard, we created a seamless workflow to import PDFs, extract text, and interact with the data using ChatGPT. As a result, we unlocked the ability to answer questions, generate summaries, and create content directly from the papers, providing valuable insights and enhancing our work with AI-powered capabilities.