CrewAI Integration
Anyparser integrates with CrewAI to enable document parsing capabilities in your AI agent crews. This guide shows you how to use Anyparser with CrewAI agents.
Installation
pip install anyparser-crewai
Basic Usage
Here’s how to create a CrewAI agent that can parse documents:
import os
from anyparser_crewai import AnyparserFormatEnum, AnyparserModelEnum, FileReadTool
# Initialize the tool with JSON formattool = FileReadTool( api_key=os.getenv("ANYPARSER_API_KEY"), api_url=os.getenv("ANYPARSER_API_URL"), format=AnyparserFormatEnum.JSON, model=AnyparserModelEnum.TEXT,)
# Read a DOCX fileresult = tool.run("docs/sample.docx")
# Print the resultprint("DOCX content in JSON format:")print(result)
Tool Configuration
Configure the Anyparser tool with various options:
parser_tool = AnyparserTool( api_key="your-api-key", format="markdown", # Output format model="text", # Model type image=True, # Extract images table=True, # Extract tables encoding="utf-8" # Specify encoding)
Multi-Agent Document Processing
Create a specialized crew for document processing:
from anyparser_crewai import AnyparserToolfrom crewai import Agent, Task, Crewfrom textwrap import dedent
# Document processor agentdoc_processor = Agent( role="Document Processor", goal="Extract and structure document content accurately", backstory=dedent(""" You are an expert in document processing, capable of extracting text, tables, and images from various document formats. """), tools=[AnyparserTool(api_key="your-api-key")])
# Content analyzer agentanalyzer = Agent( role="Content Analyzer", goal="Analyze and summarize document content", backstory=dedent(""" You are an expert content analyst, skilled at understanding and summarizing complex documents. """))
# Quality checker agentquality_checker = Agent( role="Quality Checker", goal="Ensure accuracy of extracted content", backstory=dedent(""" You are a detail-oriented quality assurance specialist, ensuring extracted content matches the original document. """))
# Create tasks for the crewtasks = [ Task( description="Extract content from document.pdf", agent=doc_processor ), Task( description="Analyze and summarize the extracted content", agent=analyzer ), Task( description="Verify accuracy of extraction and analysis", agent=quality_checker )]
# Create and run the crewcrew = Crew( agents=[doc_processor, analyzer, quality_checker], tasks=tasks)
result = crew.kickoff()
Error Handling
Implement proper error handling in your agents:
try: # Initialize the tool parser_tool = AnyparserTool(api_key="your-api-key")
# Create the agent doc_processor = Agent( role="Document Processor", goal="Process documents accurately", tools=[parser_tool] )
# Create and run tasks task = Task( description="Process document.pdf", agent=doc_processor )
crew = Crew( agents=[doc_processor], tasks=[task] )
result = crew.kickoff()except Exception as e: print(f"Error in document processing: {str(e)}") # Handle error appropriately
Tool Functions
The Anyparser tool provides these functions to CrewAI agents:
# Available tool functionsfunctions = { "parse_document": { "description": "Parse a document and extract its content", "parameters": { "file_path": "Path to the document file", "format": "Output format (json/markdown/text)", "extract_images": "Whether to extract images", "extract_tables": "Whether to extract tables" } }, "get_document_metadata": { "description": "Get metadata about a document", "parameters": { "file_path": "Path to the document file" } }}
Best Practices
When using Anyparser with CrewAI:
-
Agent Design
- Create specialized agents for different aspects of document processing
- Write clear agent goals and backstories
- Define specific tasks for each agent
- Ensure proper tool configuration
-
Crew Organization
- Structure crews based on document processing needs
- Define clear task dependencies
- Set appropriate task priorities
- Monitor crew performance
-
Error Management
- Handle API errors gracefully
- Implement retry logic
- Log error messages
- Provide fallback behavior
-
Resource Management
- Monitor API usage
- Implement rate limiting
- Cache results when appropriate
- Clean up resources