Skip to content

CrewAI Integration

Anyparser integrates with CrewAI to enable document parsing capabilities in your AI agent crews. This guide shows you how to use Anyparser with CrewAI agents.

Installation

Terminal window
pip install anyparser-crewai

Basic Usage

Here’s how to create a CrewAI agent that can parse documents:

import os
from anyparser_crewai import AnyparserFormatEnum, AnyparserModelEnum, FileReadTool
# Initialize the tool with JSON format
tool = FileReadTool(
api_key=os.getenv("ANYPARSER_API_KEY"),
api_url=os.getenv("ANYPARSER_API_URL"),
format=AnyparserFormatEnum.JSON,
model=AnyparserModelEnum.TEXT,
)
# Read a DOCX file
result = tool.run("docs/sample.docx")
# Print the result
print("DOCX content in JSON format:")
print(result)

Tool Configuration

Configure the Anyparser tool with various options:

parser_tool = AnyparserTool(
api_key="your-api-key",
format="markdown", # Output format
model="text", # Model type
image=True, # Extract images
table=True, # Extract tables
encoding="utf-8" # Specify encoding
)

Multi-Agent Document Processing

Create a specialized crew for document processing:

from anyparser_crewai import AnyparserTool
from crewai import Agent, Task, Crew
from textwrap import dedent
# Document processor agent
doc_processor = Agent(
role="Document Processor",
goal="Extract and structure document content accurately",
backstory=dedent("""
You are an expert in document processing, capable of extracting
text, tables, and images from various document formats.
"""),
tools=[AnyparserTool(api_key="your-api-key")]
)
# Content analyzer agent
analyzer = Agent(
role="Content Analyzer",
goal="Analyze and summarize document content",
backstory=dedent("""
You are an expert content analyst, skilled at understanding
and summarizing complex documents.
""")
)
# Quality checker agent
quality_checker = Agent(
role="Quality Checker",
goal="Ensure accuracy of extracted content",
backstory=dedent("""
You are a detail-oriented quality assurance specialist,
ensuring extracted content matches the original document.
""")
)
# Create tasks for the crew
tasks = [
Task(
description="Extract content from document.pdf",
agent=doc_processor
),
Task(
description="Analyze and summarize the extracted content",
agent=analyzer
),
Task(
description="Verify accuracy of extraction and analysis",
agent=quality_checker
)
]
# Create and run the crew
crew = Crew(
agents=[doc_processor, analyzer, quality_checker],
tasks=tasks
)
result = crew.kickoff()

Error Handling

Implement proper error handling in your agents:

try:
# Initialize the tool
parser_tool = AnyparserTool(api_key="your-api-key")
# Create the agent
doc_processor = Agent(
role="Document Processor",
goal="Process documents accurately",
tools=[parser_tool]
)
# Create and run tasks
task = Task(
description="Process document.pdf",
agent=doc_processor
)
crew = Crew(
agents=[doc_processor],
tasks=[task]
)
result = crew.kickoff()
except Exception as e:
print(f"Error in document processing: {str(e)}")
# Handle error appropriately

Tool Functions

The Anyparser tool provides these functions to CrewAI agents:

# Available tool functions
functions = {
"parse_document": {
"description": "Parse a document and extract its content",
"parameters": {
"file_path": "Path to the document file",
"format": "Output format (json/markdown/text)",
"extract_images": "Whether to extract images",
"extract_tables": "Whether to extract tables"
}
},
"get_document_metadata": {
"description": "Get metadata about a document",
"parameters": {
"file_path": "Path to the document file"
}
}
}

Best Practices

When using Anyparser with CrewAI:

  1. Agent Design

    • Create specialized agents for different aspects of document processing
    • Write clear agent goals and backstories
    • Define specific tasks for each agent
    • Ensure proper tool configuration
  2. Crew Organization

    • Structure crews based on document processing needs
    • Define clear task dependencies
    • Set appropriate task priorities
    • Monitor crew performance
  3. Error Management

    • Handle API errors gracefully
    • Implement retry logic
    • Log error messages
    • Provide fallback behavior
  4. Resource Management

    • Monitor API usage
    • Implement rate limiting
    • Cache results when appropriate
    • Clean up resources