Skip to content

API Reference

Dive into the comprehensive API documentation for Anyparser’s Python and JavaScript SDKs. Built with developers in mind, this reference guide provides detailed insights into every class, method, and configuration option available in our SDKs. Whether you’re implementing basic document processing or building complex document intelligence workflows, you’ll find everything you need to leverage Anyparser’s full capabilities in your applications.

Quick Navigation

Core Classes

Anyparser Class

The main entry point for interacting with Anyparser’s document processing capabilities.

from anyparser_core import Anyparser, AnyparserOption
# Initialize with options
parser = Anyparser(
options=AnyparserOption(
api_key="your-api-key",
format="markdown"
)
)
# Or use environment variables
parser = Anyparser() # Uses ANYPARSER_API_KEY from env

Methods

All you need is the parse method, Anyparser does not export anything else.

MethodDescriptionExample
parse(input)Process documents or URLsSee examples
parser(inputs)Process multiple itemsSee batch processing

Parse Method

The primary method for processing documents:

# Single document
result = await parser.parse("document.png")
# Multiple documents
results = await parser.parse(["doc1.pdf", "doc2.docx"])
# Web URL
result = await parser.parse("https://example.com")

Configuration Options

Comprehensive configuration options for customizing parser behavior:

from anyparser_core import AnyparserOption
options = AnyparserOption(
api_url: Optional[str] = None,
api_key: Optional[str] = None,
format: Literal["json", "markdown", "html"] = "json",
model: Literal["text", "ocr", "vlm", "lam", "crawler"] = "text",
encoding: Literal["utf-8", "latin1"] = "utf-8",
image: Optional[bool] = None,
table: Optional[bool] = None,
files: Optional[Union[str, List[str]]] = None,
ocr_language: Optional[List[OcrLanguage]] = None,
ocr_preset: Optional[OcrPreset] = None,
url: Optional[str] = None,
max_depth: Optional[int] = None,
max_executions: Optional[int] = None,
strategy: Optional[Literal["LIFO", "FIFO"]] = None,
traversal_scope: Optional[Literal["subtree", "domain"]] = None
)

Option Fields

FieldTypeDefaultDescription
api_urlURLEnvironment variableAPI endpoint URL
api_keystringEnvironment variableAPI authentication key
formatstring"json"Output format ("json", "markdown", "html")
modelstring"text"Processing model ("text", "ocr", "vlm", "lam", "crawler")
encodingstring"utf-8"Text encoding ("utf-8", "latin1")
imagebooleanNoneEnable image extraction
tablebooleanNoneEnable table extraction
ocr_languageList[OcrLanguage]NoneOCR language settings
ocr_presetOcrPresetNoneOCR preset configuration
max_depthnumberNoneMaximum crawl depth
max_executionsnumberNoneMaximum pages to process
strategystringNoneCrawl strategy ("LIFO", "FIFO")
traversal_scopestringNoneCrawl scope ("subtree", "domain")

Response Types

Common fields returned for all processed documents:

@dataclass
class AnyparserResultBase:
rid: str
original_filename: str
checksum: str
total_characters: Optional[int]
markdown: Optional[str]

Image Reference

@dataclass
class AnyparserImageReference:
base64_data: str
display_name: str
image_index: int
page: Optional[int]

PDF Result

Additional fields for PDF processing:

@dataclass
class AnyparserPdfPage:
page_number: int
markdown: str
text: str
images: List[str]
@dataclass
class AnyparserPdfResult(AnyparserResultBase):
total_items: int = 0
items: List[AnyparserPdfPage]

Crawl Result

Fields specific to web crawling:

@dataclass
class AnyparserCrawlDirectiveBase:
type: Literal["HTTP Header", "HTML Meta", "Combined", "Unknown"]
priority: int
name: Optional[str]
noindex: Optional[bool]
nofollow: Optional[bool]
crawl_delay: Optional[int]
unavailable_after: Optional[datetime]
@dataclass
class AnyparserCrawlDirective(AnyparserCrawlDirectiveBase):
underlying: List[AnyparserCrawlDirectiveBase]
type: Literal["Combined"]
name: Optional[None]
@dataclass
class AnyparserUrl:
url: str
status_code: int
status_message: str
politeness_delay: int
total_characters: int
markdown: str
directive: AnyparserCrawlDirective
title: Optional[str]
crawled_at: Optional[str]
images: List[AnyparserImageReference]
text: Optional[str]
@dataclass
class AnyparserRobotsTxtDirective:
user_agent: str
disallow: List[str]
allow: List[str]
crawl_delay: Optional[int]
@dataclass
class AnyparserCrawlResult:
rid: str
start_url: str
total_characters: int
total_items: int
markdown: str
items: List[AnyparserUrl]
robots_directive: AnyparserRobotsTxtDirective

Web Crawl Result

Fields specific to web crawling:

@dataclass
class CrawlDirective:
type: str # Directive type
priority: int # Processing priority
noindex: bool # Index control
nofollow: bool # Link following control
crawl_delay: int # Delay between requests
expires: datetime # Expiration time
@dataclass
class CrawlResult(AnyparserResultBase):
url: str # Processed URL
status: int # HTTP status code
title: str # Page title
links: List[str] # Discovered links
directive: CrawlDirective # Page directives
headers: Dict[str, str] # HTTP headers

Error Handling

All API calls may throw exceptions/errors for:

  • Invalid API credentials
  • Invalid options
  • Network errors
  • Rate limiting
  • Server errors