Using OpenAI plugins, you can access up-to-date information to enhance the capabilities of the Large Language Models. By connecting to these extensions, you can integrate with ChatGPT's core system for your own specific use case. With Mantium Plugin Wizard, you can easily build custom plugins for your own specific use case by leveraging Mantium's data pipeline to bring your own data into ChatGPT.
This document will provide a step-by-step guide on how to achieve it. We will focus on creating a dataset in Mantium, setting up plug-ins, and then chatting with your data.
Our goal is to use the PDF Data Connector in Mantium's platform to import PDFs from selected ArXiv papers, extract the text, set up plugins, and then query the documents to answer questions, generate content, and provide summarization—all within ChatGPT.
We understand that sometimes it's easier to learn by watching rather than reading. If you prefer a more visual explanation, feel free to check out our accompanying video tutorial below. If you prefer reading or are unable to watch the video, please continue with the text documentation.
Take a few moments to set up the API key for OpenAI.
To follow along with this tutorial, download the following papers from https://arxiv.org/ .
- Navigate to the Data Sources section by clicking
Data Sourceon the left navigation bar.
Add Data Source, and select the PDF Data Connector from the Data Sources list.
- Provide the information to label the Data Source, and click Save and Test.
- On the next page, the sync job will start automatically. If it doesn't, click on “Manual Sync” at the top right corner, to perform the initial sync.
- Wait a few moments for the sync to be completed, and navigate to
Filesto upload your papers in
- Click on the
Finish and Syncbutton to complete the upload process.
- At this point, we have successfully imported PDFs of ArXiv papers using the PDF Data Connector.
Datasets serve as the central workspace where you can apply transformations and enrichments to data retrieved from various sources, enabling you to modify and analyze the data without impacting the original information.
To create a new dataset:
- After the sync is completed, Click on the
Create Custom Datasetsbutton in the Data Source section.
- Alternatively, you can create datasets by navigating to the Datasets section on the left pane.
- Provide a Dataset name, and select where the data comes from (PDF Data Connector).
- Click on Save to save your configuration, and wait for the job to complete.
See an example of the Arxiv datasets below. Notice that you have a column with the
text element of the PDF files. (The Convert PDF to Text Transform worked automatically).
- If you select the Standard option and have previously created a split_content column, ensure to pick this same split_content column in subsequent steps rather than the original text column. This will prevent the unnecessary expansion of your dataset, ultimately keeping your OpenAI usage costs in check.
- Ensure to select the Advanced option if you have Embeddings already.
Please follow the link below to find instructions on how to create your Mantium apps.
There are two ways to interact with your app in ChatGPT;
- Use Mantium's ChatGPT Plugin to Interact with your App.(Recommended)
- Setup your own OpenAI ChatGPT Plugin, if you have developer access - which means you have the ability to create plugins in ChatGPT.
Please follow the link below to find instructions on how to setup the Mantium official plugin.
Please follow the link below to find instructions on how to setup your own plugin.
Now, let's interact the app. Feel free to copy the prompt examples below.
Prompt 1 - Ask questions
Using the PDFPlugin plugin. What is the role of Role of Data-Augmentation in A Cook’s Guide to Successful SSL Training and Deployment ?
Result in ChatGPT
Prompt 2 - Generate summaries
Using the PDFPlugin plugin, summarize the content on the Publicly Available Model Checkpoints or APIs in the Survey of Large Language Models.
Result in ChatGPT
Prompt 3 - Generate YouTube scripts using your own document as a reference.
Using the PDFPlugin plugin, summarize the content on the Publicly Available Model Checkpoints or APIs in the Survey of Large Language Models in a successful Youtuber's script style. Highlight the important technical information for the technical audience
Result in ChatGPT
Notice the highlighted text in the image; it confirms that the script was generated using the document "Survey of Large Language Models" as a reference.
In this guide, we successfully tackled the challenge of accessing and querying information from academic papers using Large Language Models (LLMs). By integrating OpenAI plugins with the Mantium plugin wizard, we created a seamless workflow to import PDFs, extract text, and interact with the data using ChatGPT. As a result, we unlocked the ability to answer questions, generate summaries, and create content directly from the papers, providing valuable insights and enhancing our work with AI-powered capabilities.
Updated 6 months ago