Parse Documents

Master document processing with Anyparser’s intelligent parsing engine. From simple text extraction to complex document understanding, our SDK handles PDFs, Word documents, rich text, and more with unmatched precision. Built for scale and performance, you’ll get structured, analysis-ready data with just a few lines of code—complete with table detection, image extraction, and format preservation capabilities that make document automation effortless.

Quick Start

Let’s start with a simple example of parsing a PDF document:

Python
JavaScript

from anyparser_core import Anyparser
import asyncio

async def main():
    # Initialize the parser
    parser = Anyparser()

    # Parse a document
    result = await parser.parse("docs/sample.pdf")

    # Access the parsed content
    print(f"File: {result.original_filename}")
    print(f"Characters: {result.total_characters}")
    print(f"Content:\n{result.markdown}")

asyncio.run(main())

import { Anyparser } from '@anyparser/core';

async function main() {
  // Initialize the parser
  const parser = new Anyparser();

  // Parse a document
  const result = await parser.parse('docs/sample.pdf');

  // Access the parsed content
  console.log(`File: ${result.originalFilename}`);
  console.log(`Characters: ${result.totalCharacters}`);
  console.log(`Content:\n${result.markdown}`);
}

main().catch(console.error);

Supported File Types

📄 Documents

PDF (with OCR support)
Microsoft Word (DOCX, DOC)
Rich Text (RTF)
Plain Text (TXT)

🖼️ Images

PNG
JPEG/JPG
TIFF
WebP See our OCR Guide for image processing details.

🌐 Web Content

HTML pages
Web URLs
Dynamic content Check our Web Crawling Guide for more.

🚀 Coming Soon

PowerPoint (PPTX, PPT)
Excel (XLSX, XLS)
Audio transcription
Video transcription

Configuration Options

Customize the parsing behavior to match your needs:

Python
JavaScript

from anyparser_core import Anyparser, AnyparserOption

options = AnyparserOption(
    format="json",     # Output format
    image=True,        # Extract images
    table=True         # Extract tables
)

parser = Anyparser(options)

import { Anyparser, AnyparserOption } from '@anyparser/core';

const options = {
  format: 'json',    // Output format
  image: true,       // Extract images
  table: true        // Extract tables
};

const parser = new Anyparser(options);

Check API Reference for more details.

Batch Processing

Process multiple documents efficiently in a single request:

Python
JavaScript

files = ["docs/sample1.pdf", "docs/sample2.docx", "docs/sample3.png"]

async def main():
    result = await parser.parse(files)

    for doc in result:
        print(f"Processing: {doc.original_filename}")
        print(f"Type: {doc.file_type}")
        print(f"Size: {doc.file_size} bytes")
        print(f"Characters: {doc.total_characters}")
        print("---")

asyncio.run(main())

const files = ['docs/sample1.pdf', 'docs/sample2.docx', 'docs/sample3.png'];

async function main() {
  const result = await parser.parse(files);

  for (const doc of result) {
    console.log(`Processing: ${doc.originalFilename}`);
    console.log(`Type: ${doc.fileType}`);
    console.log(`Size: ${doc.fileSize} bytes`);
    console.log(`Characters: ${doc.totalCharacters}`);
    console.log('---');
  }
}

main().catch(console.error);

Best Practices

Choose the Right Format
- Use json for structured data processing
- Use markdown for RAG pipelines
- Use html for quick viewing
Optimize Performance
- Process documents in batches when possible
- Consider implementing application-level caching
Handle Errors
- Implement proper error handling
- Use retries for transient failures
- Monitor processing status
- Log errors appropriately
Monitor Usage
- Track API consumption in Anyparser Studio
- Set up usage alerts
- Monitor processing times
- Optimize based on analytics

OCR Processing - Extract text from images and scanned documents
Web Crawling - Process web content and HTML pages
API Reference - Complete API documentation
Studio Dashboard - Monitor usage and performance
Security Best Practices - Secure your implementation