Photo by Joshua Sortino / Unsplash

Llama Index

Apr 16, 2023
GitHub - jerryjliu/llama_index: LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM’s with external data.
LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data. - GitHub - jerryjliu/llama_index: LlamaIndex (GPT Index) is a project that provid…

🗂️ LlamaIndex 🦙 (GPT Index)

⚠️ NOTE: We are rebranding GPT Index as LlamaIndex! We will carry out this transition gradually.

2/25/2023: By default, our docs/notebooks/instructions now reference "LlamaIndex"
instead of "GPT Index".

2/19/2023: By default, our docs/notebooks/instructions now use the llama-index package. However the gpt-index package still exists as a duplicate!

2/16/2023: We have a duplicate llama-index pip package. Simply replace all imports of gpt_index with llama_index if you choose to pip install llama-index.

LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data.





LlamaHub (community library of data loaders):

🚀 Overview

NOTE: This README is not updated as frequently as the documentation. Please check out the documentation above for the latest updates!


  • LLMs are a phenomenonal piece of technology for knowledge generation and reasoning. They are pre-trained on large amounts of publicly available data.
  • How do we best augment LLMs with our own private data?
  • One paradigm that has emerged is in-context learning (the other is finetuning), where we insert context into the input prompt. That way,
    we take advantage of the LLM's reasoning capabilities to generate a response.

To perform LLM's data augmentation in a performant, efficient, and cheap manner, we need to solve two components:

  • Data Ingestion
  • Data Indexing

Proposed Solution

That's where the LlamaIndex comes in. LlamaIndex is a simple, flexible interface between your external data and LLMs. It provides the following tools in an easy-to-use fashion:

  • Offers data connectors to your existing data sources and data formats (API's, PDF's, docs, SQL, etc.)
  • Provides indices over your unstructured and structured data for use with LLM's.
    These indices help to abstract away common boilerplate and pain points for in-context learning:
    • Storing context in an easy-to-access format for prompt insertion.
    • Dealing with prompt limitations (e.g. 4096 tokens for Davinci) when context is too big.
    • Dealing with text splitting.
  • Provides users an interface to query the index (feed in an input prompt) and obtain a knowledge-augmented output.
  • Offers you a comprehensive toolset trading off cost and performance.

💡 Contributing

Interesting in contributing? See our Contribution Guide for more details.

📄 Documentation

Full documentation can be found here:

Please check it out for the most up-to-date tutorials, how-to guides, references, and other resources!

💻 Example Usage

pip install llama-index

Examples are in the examples folder. Indices are in the indices folder (see list of indices below).

To build a simple vector store index:

import os

from llama_index import GPTSimpleVectorIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader('data').load_data()
index = GPTSimpleVectorIndex.from_documents(documents)

To save to and load from disk:

# save to disk
# load from disk
index = GPTSimpleVectorIndex.load_from_disk('index.json')

To query:


🔧 Dependencies

The main third-party package requirements are tiktoken, openai, and langchain.

All requirements should be contained within the file. To run the package locally without building the wheel, simply run pip install -r requirements.txt.

📖 Citation

Reference to cite if you use LlamaIndex in a paper:

author = {Liu, Jerry},
doi = {10.5281/zenodo.1234},
month = {11},
title = {{LlamaIndex}},
url = {},
year = {2022}