🎯 Core API
API Reference
Dive into the comprehensive API documentation for Anyparser’s Python and JavaScript SDKs. Built with developers in mind, this reference guide provides detailed insights into every class, method, and configuration option available in our SDKs. Whether you’re implementing basic document processing or building complex document intelligence workflows, you’ll find everything you need to leverage Anyparser’s full capabilities in your applications.
Quick Navigation
⚡ Features
🎛️ Resources
🔌 Integrations
Core Classes
Anyparser Class
The main entry point for interacting with Anyparser’s document processing capabilities.
from anyparser_core import Anyparser, AnyparserOption
# Initialize with optionsparser = Anyparser( options=AnyparserOption( api_key="your-api-key", format="markdown" ))
# Or use environment variablesparser = Anyparser() # Uses ANYPARSER_API_KEY from env
import { Anyparser, AnyparserOption } from '@anyparser/core';
// Initialize with optionsconst parser = new Anyparser({ apiKey: 'your-api-key', format: 'markdown'});
// Or use environment variablesconst parser = new Anyparser(); // Uses ANYPARSER_API_KEY from env
Methods
All you need is the parse
method, Anyparser does not export anything else.
Method | Description | Example |
---|---|---|
parse(input) | Process documents or URLs | See examples |
parser(inputs) | Process multiple items | See batch processing |
Parse Method
The primary method for processing documents:
# Single documentresult = await parser.parse("document.png")
# Multiple documentsresults = await parser.parse(["doc1.pdf", "doc2.docx"])
# Web URLresult = await parser.parse("https://example.com")
// Single documentconst result = await parser.parse('document.png');
// Multiple documentsconst results = await parser.parse(['doc1.pdf', 'doc2.docx']);
// Web URLconst result = await parser.parse(new URL('https://example.com'));
Configuration Options
Comprehensive configuration options for customizing parser behavior:
from anyparser_core import AnyparserOption
options = AnyparserOption( api_url: Optional[str] = None, api_key: Optional[str] = None, format: Literal["json", "markdown", "html"] = "json", model: Literal["text", "ocr", "vlm", "lam", "crawler"] = "text", encoding: Literal["utf-8", "latin1"] = "utf-8", image: Optional[bool] = None, table: Optional[bool] = None, files: Optional[Union[str, List[str]]] = None, ocr_language: Optional[List[OcrLanguage]] = None, ocr_preset: Optional[OcrPreset] = None, url: Optional[str] = None, max_depth: Optional[int] = None, max_executions: Optional[int] = None, strategy: Optional[Literal["LIFO", "FIFO"]] = None, traversal_scope: Optional[Literal["subtree", "domain"]] = None)
interface AnyparserOption { apiUrl?: URL; apiKey?: string; format?: 'json' | 'markdown' | 'html'; model?: 'text' | 'ocr' | 'vlm' | 'lam' | 'crawler'; encoding?: 'utf-8' | 'latin1'; image?: boolean; table?: boolean; files?: string | string[]; ocrLanguage?: OcrLanguageType[]; ocrPreset?: OcrPresetType; url?: string; maxDepth?: number; maxExecutions?: number; strategy?: 'LIFO' | 'FIFO'; traversalScope?: 'subtree' | 'domain';}
Option Fields
Field | Type | Default | Description |
---|---|---|---|
api_url | URL | Environment variable | API endpoint URL |
api_key | string | Environment variable | API authentication key |
format | string | "json" | Output format ("json" , "markdown" , "html" ) |
model | string | "text" | Processing model ("text" , "ocr" , "vlm" , "lam" , "crawler" ) |
encoding | string | "utf-8" | Text encoding ("utf-8" , "latin1" ) |
image | boolean | None | Enable image extraction |
table | boolean | None | Enable table extraction |
ocr_language | List[OcrLanguage] | None | OCR language settings |
ocr_preset | OcrPreset | None | OCR preset configuration |
max_depth | number | None | Maximum crawl depth |
max_executions | number | None | Maximum pages to process |
strategy | string | None | Crawl strategy ("LIFO" , "FIFO" ) |
traversal_scope | string | None | Crawl scope ("subtree" , "domain" ) |
Response Types
Common fields returned for all processed documents:
@dataclassclass AnyparserResultBase: rid: str original_filename: str checksum: str total_characters: Optional[int] markdown: Optional[str]
interface AnyparserResultBase { rid: string originalFilename: string checksum: string totalCharacters?: number markdown?: string}
Image Reference
@dataclassclass AnyparserImageReference: base64_data: str display_name: str image_index: int page: Optional[int]
interface AnyparserImageReference { base64Data: string displayName: string page?: number imageIndex: number}
PDF Result
Additional fields for PDF processing:
@dataclassclass AnyparserPdfPage: page_number: int markdown: str text: str images: List[str]
@dataclassclass AnyparserPdfResult(AnyparserResultBase): total_items: int = 0 items: List[AnyparserPdfPage]
interface AnyparserPdfPage { pageNumber: number; markdown?: string; text?: string; images?: AnyparserImageReference[];}
interface AnyparserPdfResult extends AnyparserResultBase { totalItems?: number; items?: AnyparserPdfPage[];}
Crawl Result
Fields specific to web crawling:
@dataclassclass AnyparserCrawlDirectiveBase: type: Literal["HTTP Header", "HTML Meta", "Combined", "Unknown"] priority: int name: Optional[str] noindex: Optional[bool] nofollow: Optional[bool] crawl_delay: Optional[int] unavailable_after: Optional[datetime]
@dataclassclass AnyparserCrawlDirective(AnyparserCrawlDirectiveBase): underlying: List[AnyparserCrawlDirectiveBase] type: Literal["Combined"] name: Optional[None]
@dataclassclass AnyparserUrl: url: str status_code: int status_message: str politeness_delay: int total_characters: int markdown: str directive: AnyparserCrawlDirective title: Optional[str] crawled_at: Optional[str] images: List[AnyparserImageReference] text: Optional[str]
@dataclassclass AnyparserRobotsTxtDirective: user_agent: str disallow: List[str] allow: List[str] crawl_delay: Optional[int]
@dataclassclass AnyparserCrawlResult: rid: str start_url: str total_characters: int total_items: int markdown: str items: List[AnyparserUrl] robots_directive: AnyparserRobotsTxtDirective
interface AnyparserCrawlDirectiveBase { type: 'HTTP Header' | 'HTML Meta' | 'Combined' priority: number name?: string noindex?: boolean nofollow?: boolean crawlDelay?: number unavailableAfter?: Date}
interface AnyparserCrawlDirective extends AnyparserCrawlDirectiveBase { underlying: AnyparserCrawlDirectiveBase[] type: 'Combined' name: undefined}
interface AnyparserUrl { url: URL title?: string crawledAt?: string statusCode: number statusMessage: string directive: AnyparserCrawlDirective totalCharacters?: number markdown?: string images?: AnyparserImageReference[] text?: string politenessDelay: number}
interface AnyparserRobotsTxtDirective { userAgent: string disallow: Set<string> allow: Set<string> crawlDelay?: number}
interface AnyparserCrawlResult { rid: string; startUrl: URL; totalCharacters: number; totalItems: number; markdown: string; items?: AnyparserUrl[]; robotsDirective: AnyparserRobotsTxtDirective;}
Web Crawl Result
Fields specific to web crawling:
@dataclassclass CrawlDirective: type: str # Directive type priority: int # Processing priority noindex: bool # Index control nofollow: bool # Link following control crawl_delay: int # Delay between requests expires: datetime # Expiration time
@dataclassclass CrawlResult(AnyparserResultBase): url: str # Processed URL status: int # HTTP status code title: str # Page title links: List[str] # Discovered links directive: CrawlDirective # Page directives headers: Dict[str, str] # HTTP headers
interface CrawlDirective { type: string // Directive type priority: number // Processing priority noindex: boolean // Index control nofollow: boolean // Link following control crawlDelay: number // Delay between requests expires: Date // Expiration time}
interface CrawlResult extends AnyparserResultBase { url: string // Processed URL status: number // HTTP status code title: string // Page title links: string[] // Discovered links directive: CrawlDirective // Page directives headers: Record<string, string> // HTTP headers}
Error Handling
All API calls may throw exceptions/errors for:
- Invalid API credentials
- Invalid options
- Network errors
- Rate limiting
- Server errors
Related Resources
- Getting Started Guide - Quick start guide
- Parse Documents - Document processing guide
- OCR Guide - OCR processing details
- Web Crawling - Web content processing
- Studio Dashboard - Usage monitoring and management
- Security Guide - Security best practices