📄 Documents
- PDF (with OCR support)
- Microsoft Word (DOCX, DOC)
- Rich Text (RTF)
- Plain Text (TXT)
Master document processing with Anyparser’s intelligent parsing engine. From simple text extraction to complex document understanding, our SDK handles PDFs, Word documents, rich text, and more with unmatched precision. Built for scale and performance, you’ll get structured, analysis-ready data with just a few lines of code—complete with table detection, image extraction, and format preservation capabilities that make document automation effortless.
Let’s start with a simple example of parsing a PDF document:
from anyparser_core import Anyparserimport asyncio
async def main(): # Initialize the parser parser = Anyparser()
# Parse a document result = await parser.parse("docs/sample.pdf")
# Access the parsed content print(f"File: {result.original_filename}") print(f"Characters: {result.total_characters}") print(f"Content:\n{result.markdown}")
asyncio.run(main())
import { Anyparser } from '@anyparser/core';
async function main() { // Initialize the parser const parser = new Anyparser();
// Parse a document const result = await parser.parse('docs/sample.pdf');
// Access the parsed content console.log(`File: ${result.originalFilename}`); console.log(`Characters: ${result.totalCharacters}`); console.log(`Content:\n${result.markdown}`);}
main().catch(console.error);
📄 Documents
🖼️ Images
🌐 Web Content
🚀 Coming Soon
Customize the parsing behavior to match your needs:
from anyparser_core import Anyparser, AnyparserOption
options = AnyparserOption( format="json", # Output format image=True, # Extract images table=True # Extract tables)
parser = Anyparser(options)
import { Anyparser, AnyparserOption } from '@anyparser/core';
const options = { format: 'json', // Output format image: true, // Extract images table: true // Extract tables};
const parser = new Anyparser(options);
Check API Reference for more details.
Process multiple documents efficiently in a single request:
files = ["docs/sample1.pdf", "docs/sample2.docx", "docs/sample3.png"]
async def main(): result = await parser.parse(files)
for doc in result: print(f"Processing: {doc.original_filename}") print(f"Type: {doc.file_type}") print(f"Size: {doc.file_size} bytes") print(f"Characters: {doc.total_characters}") print("---")
asyncio.run(main())
const files = ['docs/sample1.pdf', 'docs/sample2.docx', 'docs/sample3.png'];
async function main() { const result = await parser.parse(files);
for (const doc of result) { console.log(`Processing: ${doc.originalFilename}`); console.log(`Type: ${doc.fileType}`); console.log(`Size: ${doc.fileSize} bytes`); console.log(`Characters: ${doc.totalCharacters}`); console.log('---'); }}
main().catch(console.error);
Choose the Right Format
json
for structured data processingmarkdown
for RAG pipelineshtml
for quick viewingOptimize Performance
Handle Errors
Monitor Usage