Langchain Integration
Anyparser provides dedicated LangChain integration packages for both Python and JavaScript, enabling you to easily incorporate document parsing into your LangChain applications.
Installation
pip install anyparser_langchain
npm install @anyparser/langchain# oryarn add @anyparser/langchain
Basic Usage
Here’s how to use Anyparser as a document loader in LangChain:
from anyparser_langchain import AnyparserLoader
# Initialize the loader with your API keyloader = AnyparserLoader( file_path="document.pdf", anyparser_api_key="your-api-key", format="markdown" # LangChain works best with markdown)
# Load the documentdocuments = loader.load()
# Use the documents in your LangChain pipelinefor doc in documents: print(f"Content: {doc.page_content}") print(f"Metadata: {doc.metadata}")
import { AnyparserLoader } from "@anyparser/langchain";
// Initialize the loader with your API keyconst loader = new AnyparserLoader({ filePath: "document.pdf", anyparserApiKey: "your-api-key", format: "markdown" // LangChain works best with markdown});
// Load the documentconst documents = await loader.load();
// Use the documents in your LangChain pipelinefor (const doc of documents) { console.log("Content:", doc.pageContent); console.log("Metadata:", doc.metadata);}
Advanced Configuration
You can customize the Anyparser loader with various options:
loader = AnyparserLoader( file_path="document.pdf", anyparser_api_key="your-api-key", format="markdown", image=True, # Extract images table=True, # Extract tables encoding="utf-8" # Specify encoding)
const loader = new AnyparserLoader({ filePath: "document.pdf", anyparserApiKey: "your-api-key", format: "markdown", image: true, // Extract images table: true, // Extract tables encoding: "utf-8" // Specify encoding});
Using with LangChain Chains
Integrate Anyparser-loaded documents into LangChain chains:
from langchain.chains import RetrievalQAfrom langchain.embeddings import OpenAIEmbeddingsfrom langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain.vectorstores import Chromafrom langchain.llms import OpenAIfrom anyparser_langchain import AnyparserLoader
# Load documentsloader = AnyparserLoader( file_path="document.pdf", anyparser_api_key="your-api-key")documents = loader.load()
# Split text into chunkstext_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200)texts = text_splitter.split_documents(documents)
# Create embeddings and store in vector databaseembeddings = OpenAIEmbeddings()vectorstore = Chroma.from_documents(texts, embeddings)
# Create a question-answering chainqa_chain = RetrievalQA.from_chain_type( llm=OpenAI(), chain_type="stuff", retriever=vectorstore.as_retriever())
# Query your documentsresponse = qa_chain.run("What is this document about?")print(response)
import { RetrievalQAChain } from "langchain/chains";import { OpenAIEmbeddings } from "langchain/embeddings/openai";import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";import { Chroma } from "langchain/vectorstores/chroma";import { OpenAI } from "langchain/llms/openai";import { AnyparserLoader } from "@anyparser/langchain";
// Load documentsconst loader = new AnyparserLoader({ filePath: "document.pdf", anyparserApiKey: "your-api-key"});const documents = await loader.load();
// Split text into chunksconst textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000, chunkOverlap: 200});const texts = await textSplitter.splitDocuments(documents);
// Create embeddings and store in vector databaseconst embeddings = new OpenAIEmbeddings();const vectorstore = await Chroma.fromDocuments(texts, embeddings);
// Create a question-answering chainconst model = new OpenAI();const chain = RetrievalQAChain.fromLLM( model, vectorstore.asRetriever());
// Query your documentsconst response = await chain.call({ query: "What is this document about?"});console.log(response);
Error Handling
Implement proper error handling for both document loading and processing:
try: loader = AnyparserLoader( file_path="document.pdf", anyparser_api_key="your-api-key" ) documents = loader.load()except Exception as e: print(f"Error loading document: {str(e)}")
try { const loader = new AnyparserLoader({ filePath: "document.pdf", anyparserApiKey: "your-api-key" }); const documents = await loader.load();} catch (error) { console.error("Error loading document:", error.message);}