Skip to content

Langchain Integration

Anyparser provides dedicated LangChain integration packages for both Python and JavaScript, enabling you to easily incorporate document parsing into your LangChain applications.

Installation

Terminal window
pip install anyparser_langchain

Basic Usage

Here’s how to use Anyparser as a document loader in LangChain:

from anyparser_langchain import AnyparserLoader
# Initialize the loader with your API key
loader = AnyparserLoader(
file_path="document.pdf",
anyparser_api_key="your-api-key",
format="markdown" # LangChain works best with markdown
)
# Load the document
documents = loader.load()
# Use the documents in your LangChain pipeline
for doc in documents:
print(f"Content: {doc.page_content}")
print(f"Metadata: {doc.metadata}")

Advanced Configuration

You can customize the Anyparser loader with various options:

loader = AnyparserLoader(
file_path="document.pdf",
anyparser_api_key="your-api-key",
format="markdown",
image=True, # Extract images
table=True, # Extract tables
encoding="utf-8" # Specify encoding
)

Using with LangChain Chains

Integrate Anyparser-loaded documents into LangChain chains:

from langchain.chains import RetrievalQA
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.llms import OpenAI
from anyparser_langchain import AnyparserLoader
# Load documents
loader = AnyparserLoader(
file_path="document.pdf",
anyparser_api_key="your-api-key"
)
documents = loader.load()
# Split text into chunks
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
texts = text_splitter.split_documents(documents)
# Create embeddings and store in vector database
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(texts, embeddings)
# Create a question-answering chain
qa_chain = RetrievalQA.from_chain_type(
llm=OpenAI(),
chain_type="stuff",
retriever=vectorstore.as_retriever()
)
# Query your documents
response = qa_chain.run("What is this document about?")
print(response)

Error Handling

Implement proper error handling for both document loading and processing:

try:
loader = AnyparserLoader(
file_path="document.pdf",
anyparser_api_key="your-api-key"
)
documents = loader.load()
except Exception as e:
print(f"Error loading document: {str(e)}")