Ultimate guide to Query Generation Model(GPT-3)

Tirth Patel
9 min readDec 20, 2021

If a worm with 302 neurons is conscious, so says Ethics professor David Chalmers, “then I am open to the idea that GPT-3 with 175bn parameters is conscious too.”

What is GPT3?

Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. GPT-3 was created by OpenAI, an artificial intelligence research laboratory in San Francisco. The 175-billion parameter deep learning model is capable of producing human-like text and was trained on large text datasets with hundreds of billions of words.It is the third-generation language prediction model in the GPT-n series (and the successor to GPT-2) created by OpenAI, a San Francisco-based artificial intelligence research laboratory. GPT-3’s full version has a capacity of 175 billion machine learning parameters. GPT-3, which was introduced in May 2020, and was in beta testing as of July 2020, is part of a trend in natural language processing (NLP) systems of pre-trained language representations. Before the release of GPT-3, the largest language model was Microsoft’s Turing NLG, introduced in February 2020, with a capacity of 17 billion parameters — less than a tenth of GPT-3. Based on the original paper that introduced this model, GPT-3 was trained using a combination of the following large text datasets:

  • Common Crawl
  • WebText2
  • Books1
  • Books2
  • Wikipedia Corpus

The final dataset contained a large portion of web pages from the internet, a giant collection of books, and all of Wikipedia. Researchers used this dataset with hundreds of billions of words to train GPT-3 to generate text in English in several other languages.

The quality of the text generated by GPT-3 is so high that it is difficult to distinguish from that written by a human, which has both benefits and risks. Thirty-one OpenAI researchers and engineers presented the original May 28, 2020 paper introducing GPT-3. In their paper, they warned of GPT-3’s potential dangers and called for research to mitigate risk.

Why it is so powerful?

GPT-3 has made headlines since last summer because it can perform a wide variety of natural language tasks and produces human-like text. The tasks that GPT-3 can perform include, but are not limited to:

  • Text classification (ie. sentiment analysis)
  • Question answering
  • Text generation
  • Text summarization
  • Named-entity recognition
  • Language translation

Based on the tasks that GPT-3 can perform, we can think of it as a model that can perform reading comprehension and writing tasks at a near-human level except that it has seen more text than any human will ever read in their lifetime. This is exactly why GPT-3 is so powerful. Entire startups have been created with GPT-3 because we can think of it as a general-purpose swiss army knife for solving a wide variety of problems in natural language processing.

Consider some of the limitations of GPT-3 listed below:

  • GPT-3 lacks long-term memory — the model does not learn anything from long-term interactions like humans.
  • Lack of interpretability — this is a problem that affects extremely large and complex in general. GPT-3 is so large that it is difficult to interpret or explain the output that it produces.
  • Limited input size — transformers have a fixed maximum input size and this means that prompts that GPT-3 can deal with cannot be longer than a few sentences.
  • Slow inference time — because GPT-3 is so large, it takes more time for the model to produce predictions.
  • GPT-3 suffers from bias — all models are only as good as the data that was used to train them and GPT-3 is no exception. This paper, for example, demonstrates that GPT-3 and other large language models contain anti-Muslim bias.

While GPT-3 is powerful, it still has limitations that make it far from being a perfect language model or an example of artificial general intelligence
(AGI).

How the transformer language model works?

Transformer models are a type of neural network designed to process sequences — transforming an input sequence to an output sequence. In the case of a language model, these are sequences of words. In fact, the acronym GPT actually stands for “Generative Pre-trained Transformer.”

You can think about the specific type of language model that GPT-3 uses by dividing it up into two parts. The encoder translates the initial sequence into a vector, a list of numbers that can be easily interpreted by a computer, and the decoder decodes that information into the output sequence.

However, the key part of transformers is a process called attention, which enables the model to understand which words are most important to consider when evaluating which sequence to produce. In each iteration of the encoder, numerical weights are assigned to each word, which are then analyzed by the decoder, thereby determining the key aspects of the input text upon which the output should be based.

Engines Available

The OpenAI API is powered by a family of models with different capabilities and price points. Engines describe and provide access to these models.

Base Series

The base GPT-3 models can understand and generate natural language. OpenAI offer four base models called davinci, curie, babbage, and ada with different levels of power suitable for different tasks. Davinci is the most capable model, and Ada is the fastest.

While Davinci is generally the most capable, the other models can perform certain tasks extremely well with significant speed or cost advantages. For example, Curie can perform many of the same tasks as Davinci, but faster and for 1/10th the cost.

Davinci is the most capable engine and can perform any task the other models can perform and often with less instruction. For applications requiring a lot of understanding of the content, like summarization for a specific audience and creative content generation, Davinci is going to produce the best results. These increased capabilities require more compute resources, so Davinci costs more per API call and is not as fast as the other engines. Another area where Davinci shines is in understanding the intent of text. Davinci is quite good at solving many kinds of logic problems and explaining the motives of characters. Davinci has been able to solve some of the most challenging AI problems involving cause and effect.

Use cases:

  • Complex intent
  • cause and effect
  • summarization for audience
  • creative content generation

Curie is extremely powerful, yet very fast. While Davinci is stronger when it comes to analyzing complicated text, Curie is quite capable for many nuanced tasks like sentiment classification and summarization. Curie is also quite good at answering questions and performing Q&A and as a general service chatbot.

Use cases:

  • Language translation
  • complex classification
  • text sentiment
  • summarization
  • Q&A system/ChatBot

Babbage can perform straightforward tasks like simple classification. It’s also quite capable when it comes to Semantic Search ranking how well documents match up with search queries.

Use cases:

  • Moderate classification
  • semantic search classification

Ada is usually the fastest model and can perform tasks like parsing text, address correction and certain kinds of classification tasks that don’t require too much nuance. Ada’s performance can often be improved by providing more context.

Use cases:

  • Parsing text
  • simple classification
  • address correction
  • keywords extraction

Let’s look at the code for predicting Gremlin Queries

Trust me, it’s too easy. We have to create a class GPT where we can add examples to be trained. We have to set the GPT Engine and set the temperature and tokens. Once, we have set the GPT Engine we have to make call to the API for getting the output.

gpt.py

import openaidef set_openai_key(key):
openai.api_key = key
class Example():
def __init__(self, inp, out):
self.input = inp
self.output = out
def get_input(self):
return self.input
def get_output(self):
return self.output
def format(self):
return f"input: {self.input}\noutput: {self.output}\n"
class GPT:
def __init__(self, engine='davinci',
temperature=0.5,
max_tokens=100):
self.examples = []
self.engine = engine
self.temperature = temperature
self.max_tokens = max_tokens
def add_example(self, ex):
assert isinstance(ex, Example), "Please create an Example object."
self.examples.append(ex.format())
def get_prime_text(self):
return '\n'.join(self.examples) + '\n'
def get_engine(self):
return self.engine
def get_temperature(self):
return self.temperature
def get_max_tokens(self):
return self.max_tokens
def craft_query(self, prompt):
return self.get_prime_text() + "input: " + prompt + "\n"
def submit_request(self, prompt):
response = openai.Completion.create(engine=self.get_engine(),
prompt=self.craft_query(prompt),
max_tokens=self.get_max_tokens(),
temperature=self.get_temperature(),
top_p=1,
n=1,
stream=False,
stop="\ninput:")
return response
def get_top_reply(self, prompt):
response = self.submit_request(prompt)
return response['choices'][0]['text']

Now, for prediction we need to send API call to OpenAI’s GPT model. Before that, we need to add examples.

gpt-train.ipynb

import json
import openai
openai.api_key = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"from gpt import GPT
from gpt import Example
gpt = GPT(engine="davinci",
temperature=0.5,
max_tokens=100)
gpt.add_example(Example('Count the number of Vertices in the graph.',
'g.V().count()'))

gpt.add_example(Example('Count Vertices',
'g.V().count()'))
gpt.add_example(Example('Count the Vertices in the database.',
'g.V().count()'))
gpt.add_example(Example('Could you please count the number of Vertices?',
'g.V().count()'))
gpt.add_example(Example('Do you think you can count the number of Vertices for me?',
'g.V().count()'))
gpt.add_example(Example('Please give me list of persons from the graph.',
'g.V().hasLabel("persons")'))
gpt.add_example(Example('Please give me list of cities from the graph.',
'g.V().hasLabel("cities")'))
gpt.add_example(Example('Please give me list of companies from the graph.',
'g.V().hasLabel("companies")'))
gpt.add_example(Example('Could you fetch the list of persons from the database?',
'g.V().hasLabel("persons")'))

gpt.add_example(Example('Could you fetch the list of companies from the database?',
'g.V().hasLabel("companies")'))
gpt.add_example(Example('Could you fetch the list of cities from the database?',
'g.V().hasLabel("cities")'))
gpt.add_example(Example('fetch me the list of persons having age greater than 50.',
'g.V().hasLabel("persons").has("age", gt(50))'))

gpt.add_example(Example('give me the list of students having weight greater than 30.',
'g.V().hasLabel("students").has("weight", gt(30))'))
gpt.add_example(Example('could you provide me the list of animals having height greater than 150.',
'g.V().hasLabel("animals").has("height", gt(150))'))
# Lesser thangpt.add_example(Example('could you provide me the list of persons having height lesser than 30.',
'g.V().hasLabel("persons").has("height", lt(30))'))
gpt.add_example(Example('give me list of companies having revenue less than 1 billion.',
'g.V().hasLabel("companies").has("revenue", lt(150))'))
gpt.add_example(Example('could you provide me the list of companies having Net income less than 500 billion.',
'g.V().hasLabel("companies").has("Net income", lt(500))'))
# Financial Queriesgpt.add_example(Example('give me the revenue of microsoft for the years above 2010 ',
'g.V().has("companies","Microsoft").has("years", gt(2010)).values("revenue")'))
gpt.add_example(Example('give me the gross margin of facebook for the years below 2015 ',
'g.V().has("companies","Facebook").has("years", lt(2015)).values("revenue")'))
gpt.add_example(Example('fetch the Net income of Apple for the years above 2000 ',
'g.V().has("companies","Apple").has("years", gt(2000)).values("revenue")'))
gpt.add_example(Example('give me the diluted earnings per share of microsoft for the year 2020 ',
'g.V().has("companies","Microsoft").has("years", 2020)).values("diluted earnings per share")'))
gpt.add_example(Example('provide me with the total assets of Merrill Lynch for the year 2019 ',
'g.V().has("companies","Merrill Lynch").has("years", 2019)).values("total assets")'))
gpt.add_example(Example('fetch me the operating income of TD Ameritrade for the year 2015 ',
'g.V().has("companies","TD Ameritrade").has("years", 2015)).values("operating income")'))

To make predictions, we have to send a request to API.

prompt1 = "Count the vertices of cosmos db database"
output1 = gpt.submit_request(prompt1)
output1.choices[0].text
Output for Promt1 input
prompt2 = "get the list of cities from the graph"
output2 = gpt.submit_request(prompt2)
output2.choices[0].text
Output for prompt2 input

Below are some other testing examples and our model seems to be working good to convert English commands to gremlin queries.

Building Knowledge Graph on Cosmos DB

Below is the code for pushing the financial entities into cosmos db in azure.

Now, we have the knowledge graph and automated querying model, we can test our custom queries and check the results.

I hope you have enjoyed and understood the article. We can use this model to train and generate different language(MongoDB, SQL) queries to build an intelligent database querying model.

--

--