Mansi Phute

Investigating and Leveraging Knowledge in Large Language Models

, 2023


Large Language Models (LLMs) are trained on a large corpora of text. This enables them to generate text effectively, while retaining some implicit knowledge, such as factual associations, grammar, contextual information, as well as reasoning, logical and mathematical abilities. Since then the question, “What knowledge do LLMs contain?” has gained increasing relevance. In this context “knowledge” is defined as questions an LLM can correctly answer without accessing external context or knowledge bases. In this project we will investigate the question: “What knowledge do LLMs contain?” by extracting a large scale knowledge graph from the LLM, which we can confidently say will represent the knowledge present inside it, followed by analysing this knowledge graph to gain insights into an LLM and explain its generation results in response to particular prompts.

This question is particularly interesting considering the wide adoption of LLMs as writing tools. There have been cases where individuals mistakenly assume everything LLMs generate is the truth. This is not helped by the black box nature of LLMs where we lack any method of attributing the text it generates to the input data. Consequently, there is no way of telling when LLMs hallucinate data. Knowledge graphs extracted from LLMs can help us gain more insight into this phenomenon and help explain the LLM generated text. It can also help us understand how accurate the knowledge in LLMs is, thus enabling us to train better LLMs. It also gives us a light weight representation of the connections in an LLM.