Skip to content

LlamaIndex Integration

Anyparser integrates with LlamaIndex to provide powerful document parsing capabilities for your RAG (Retrieval Augmented Generation) applications. This guide covers how to use Anyparser with both Python and JavaScript versions of LlamaIndex.

Installation

Terminal window
pip install anyparser-llamaindex

Basic Usage

Here’s how to use Anyparser with LlamaIndex:

from anyparser_llamaindex import AnyparserReader
from llama_index import Document, VectorStoreIndex
# Initialize the reader
reader = AnyparserReader(api_key="your-api-key")
# Load documents
documents = reader.load_data("document.pdf")
# Create index from documents
index = VectorStoreIndex.from_documents(documents)
# Query your documents
query_engine = index.as_query_engine()
response = query_engine.query("What is this document about?")

Advanced Configuration

Configure the reader with various options:

reader = AnyparserReader(
api_key="your-api-key",
format="markdown", # Output format
model="text", # Model type
image=True, # Extract images
table=True, # Extract tables
encoding="utf-8", # Specify encoding
chunk_size=1000, # Size of text chunks
chunk_overlap=200 # Overlap between chunks
)

Metadata Handling

Anyparser automatically extracts and preserves metadata for LlamaIndex:

# Load documents with metadata
documents = reader.load_data(
"document.pdf",
include_metadata=True # Include document metadata
)
# Access metadata
for doc in documents:
print(f"File: {doc.metadata['filename']}")
print(f"Pages: {doc.metadata['total_pages']}")
print(f"Format: {doc.metadata['format']}")

Batch Processing

Process multiple documents efficiently:

# Load multiple documents
documents = reader.load_data([
"document1.pdf",
"document2.docx",
"document3.txt"
])
# Create index from batch
index = VectorStoreIndex.from_documents(documents)

Error Handling

Implement proper error handling:

try:
documents = reader.load_data("document.pdf")
except Exception as e:
print(f"Error loading document: {str(e)}")
# Handle error appropriately