Langchain Integration
Anyparser provides dedicated LangChain integration packages for both Python and JavaScript, enabling you to easily incorporate document parsing into your LangChain applications.
Installation
pip install anyparser_langchainnpm install @anyparser/langchain# oryarn add @anyparser/langchainBasic Usage
Here’s how to use Anyparser as a document loader in LangChain:
from anyparser_langchain import AnyparserLoader
# Initialize the loader with your API keyloader = AnyparserLoader(    file_path="document.pdf",    anyparser_api_key="your-api-key",    format="markdown"  # LangChain works best with markdown)
# Load the documentdocuments = loader.load()
# Use the documents in your LangChain pipelinefor doc in documents:    print(f"Content: {doc.page_content}")    print(f"Metadata: {doc.metadata}")import { AnyparserLoader } from "@anyparser/langchain";
// Initialize the loader with your API keyconst loader = new AnyparserLoader({  filePath: "document.pdf",  anyparserApiKey: "your-api-key",  format: "markdown"  // LangChain works best with markdown});
// Load the documentconst documents = await loader.load();
// Use the documents in your LangChain pipelinefor (const doc of documents) {  console.log("Content:", doc.pageContent);  console.log("Metadata:", doc.metadata);}Advanced Configuration
You can customize the Anyparser loader with various options:
loader = AnyparserLoader(    file_path="document.pdf",    anyparser_api_key="your-api-key",    format="markdown",    image=True,        # Extract images    table=True,        # Extract tables    encoding="utf-8"   # Specify encoding)const loader = new AnyparserLoader({  filePath: "document.pdf",  anyparserApiKey: "your-api-key",  format: "markdown",  image: true,        // Extract images  table: true,        // Extract tables  encoding: "utf-8"   // Specify encoding});Using with LangChain Chains
Integrate Anyparser-loaded documents into LangChain chains:
from langchain.chains import RetrievalQAfrom langchain.embeddings import OpenAIEmbeddingsfrom langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain.vectorstores import Chromafrom langchain.llms import OpenAIfrom anyparser_langchain import AnyparserLoader
# Load documentsloader = AnyparserLoader(    file_path="document.pdf",    anyparser_api_key="your-api-key")documents = loader.load()
# Split text into chunkstext_splitter = RecursiveCharacterTextSplitter(    chunk_size=1000,    chunk_overlap=200)texts = text_splitter.split_documents(documents)
# Create embeddings and store in vector databaseembeddings = OpenAIEmbeddings()vectorstore = Chroma.from_documents(texts, embeddings)
# Create a question-answering chainqa_chain = RetrievalQA.from_chain_type(    llm=OpenAI(),    chain_type="stuff",    retriever=vectorstore.as_retriever())
# Query your documentsresponse = qa_chain.run("What is this document about?")print(response)import { RetrievalQAChain } from "langchain/chains";import { OpenAIEmbeddings } from "langchain/embeddings/openai";import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";import { Chroma } from "langchain/vectorstores/chroma";import { OpenAI } from "langchain/llms/openai";import { AnyparserLoader } from "@anyparser/langchain";
// Load documentsconst loader = new AnyparserLoader({  filePath: "document.pdf",  anyparserApiKey: "your-api-key"});const documents = await loader.load();
// Split text into chunksconst textSplitter = new RecursiveCharacterTextSplitter({  chunkSize: 1000,  chunkOverlap: 200});const texts = await textSplitter.splitDocuments(documents);
// Create embeddings and store in vector databaseconst embeddings = new OpenAIEmbeddings();const vectorstore = await Chroma.fromDocuments(texts, embeddings);
// Create a question-answering chainconst model = new OpenAI();const chain = RetrievalQAChain.fromLLM(  model,  vectorstore.asRetriever());
// Query your documentsconst response = await chain.call({  query: "What is this document about?"});console.log(response);Error Handling
Implement proper error handling for both document loading and processing:
try:    loader = AnyparserLoader(        file_path="document.pdf",        anyparser_api_key="your-api-key"    )    documents = loader.load()except Exception as e:    print(f"Error loading document: {str(e)}")try {  const loader = new AnyparserLoader({    filePath: "document.pdf",    anyparserApiKey: "your-api-key"  });  const documents = await loader.load();} catch (error) {  console.error("Error loading document:", error.message);}