Skip to content

Agent Architecture

The BMLibrarian agents module provides a modular, extensible architecture for AI-powered biomedical literature tasks. The system is built around a common BaseAgent class with specialized agents for specific tasks.

Architecture Principles

1. Separation of Concerns

Each agent has a single, well-defined responsibility:

Agent Responsibility
QueryAgent Natural language to PostgreSQL query conversion
DocumentScoringAgent Document relevance assessment (1-5 scale)
CitationFinderAgent Extract relevant passages and citations
ReportingAgent Synthesize citations into medical reports
CounterfactualAgent Generate contradictory evidence questions

2. Common Foundation

All agents inherit from BaseAgent which provides:

  • Ollama client management and connection handling
  • Standardized error handling patterns
  • Callback system for progress updates and UI integration
  • Model configuration and parameter management
  • Connection testing utilities

3. Maintainability

  • Each agent class is kept under 500 lines
  • Clear separation between core and specialized features
  • Comprehensive unit testing with mocked dependencies
  • Backward compatibility layer for existing code

Directory Structure

src/bmlibrarian/agents/
├── __init__.py              # Public API exports
├── base.py                  # BaseAgent abstract class
├── query_agent.py           # Natural language query conversion
├── scoring_agent.py         # Document relevance scoring
├── citation_agent.py        # Citation extraction from documents
├── reporting_agent.py       # Report synthesis and formatting
├── counterfactual_agent.py  # Counterfactual analysis
├── queue_manager.py         # SQLite-based task queue system
└── orchestrator.py          # Multi-agent workflow coordination

Base Agent Class

Abstract Base Class Design

from abc import ABC, abstractmethod
from typing import Optional, Callable

class BaseAgent(ABC):
    def __init__(
        self,
        model: str,
        host: str,
        temperature: float,
        top_p: float,
        callback: Optional[Callable]
    ):
        # Common initialization

    @abstractmethod
    def get_agent_type(self) -> str:
        """Must be implemented by subclasses"""
        pass

    # Common functionality
    def test_connection(self) -> bool: ...
    def get_available_models(self) -> list[str]: ...
    def _make_ollama_request(self, messages: list, **options) -> str: ...
    def _call_callback(self, step: str, data: str) -> None: ...

Key Features

Standardized Ollama Integration:

  • Consistent client initialization
  • Error handling for connection issues
  • Model availability checking

Callback System:

  • Progress updates for UI integration
  • Error-tolerant callback execution
  • Step-based progress tracking

Configuration Management:

  • Model parameters with sensible defaults
  • Option overrides for specific requests
  • Host and model configurability

Specialized Agents

QueryAgent

Purpose: Convert natural language questions to PostgreSQL to_tsquery format

Key Methods:

def convert_question(question: str) -> str:
    """Core conversion functionality"""

def find_abstracts(...) -> Generator:
    """Integrated search with database"""

Error Handling:

  • Input validation (empty questions)
  • Ollama connection errors
  • Query format validation with warnings

DocumentScoringAgent

Purpose: Evaluate document relevance with scores and reasoning

Key Methods:

def evaluate_document(question: str, document: Dict) -> ScoringResult:
    """Single document evaluation"""

def batch_evaluate_documents(...) -> List[Tuple]:
    """Efficient multi-document scoring"""

def get_top_documents(...) -> List:
    """Ranked selection with filtering"""

Response Structure:

class ScoringResult(TypedDict):
    score: int      # 0-5 relevance score
    reasoning: str  # Explanation for the score

CitationFinderAgent

Purpose: Extract relevant passages and citations from documents

Key Methods:

def extract_citation(question: str, document: Dict) -> Citation:
    """Extract single citation"""

def process_scored_documents_for_citations(...) -> List[Citation]:
    """Batch processing with progress tracking"""

ReportingAgent

Purpose: Synthesize citations into medical publication-style reports

Report Features:

  • Professional medical writing style
  • Evidence strength assessment
  • Vancouver-style reference formatting
  • Methodology notes and quality controls
  • Structured markdown output

CounterfactualAgent

Purpose: Analyze documents to generate research questions for contradictory evidence

Response Structure:

class CounterfactualAnalysis(TypedDict):
    document_title: str
    main_claims: List[str]
    counterfactual_questions: List[CounterfactualQuestion]
    overall_assessment: str
    confidence_level: str

Integration Patterns

Database Integration

from bmlibrarian.agents import QueryAgent
from bmlibrarian.database import find_abstracts

query_agent = QueryAgent()
for doc in query_agent.find_abstracts("COVID vaccines"):
    print(doc['title'])

Combined Workflows

from bmlibrarian.agents import (
    QueryAgent,
    DocumentScoringAgent,
    CitationFinderAgent,
    ReportingAgent,
    CounterfactualAgent
)

question = "What are the cardiovascular benefits of exercise?"

# 1. Search for documents
documents = list(query_agent.find_abstracts(question))

# 2. Score documents for relevance
scored_docs = scoring_agent.batch_evaluate_documents(question, documents)

# 3. Extract citations from high-scoring documents
high_scoring = [(doc, result) for doc, result in scored_docs if result['score'] > 3]
citations = citation_agent.process_scored_documents_for_citations(
    user_question=question,
    scored_documents=high_scoring
)

# 4. Generate comprehensive report
report = reporting_agent.synthesize_report(question, citations)

# 5. Analyze for contradictory evidence
counterfactual = counterfactual_agent.analyze_document(
    document_content=report,
    document_title=f"Research Report: {question}"
)

Callback Integration

def progress_callback(step: str, data: str):
    print(f"[{step}] {data}")

agent = QueryAgent(callback=progress_callback)
results = agent.find_abstracts("heart disease")

Extending the Architecture

Adding New Agents

  1. Inherit from BaseAgent:
from bmlibrarian.agents.base import BaseAgent

class ResearchAgent(BaseAgent):
    def get_agent_type(self) -> str:
        return "research_agent"

    def summarize_documents(self, documents: list) -> str:
        # Implementation here
        pass
  1. Add to __init__.py:
from .research_agent import ResearchAgent
__all__.append("ResearchAgent")
  1. Write Tests:
class TestResearchAgent:
    def test_summarize_documents(self):
        # Test implementation
        pass

Testing Strategy

Running Tests

# Run all agent tests
uv run pytest tests/test_agents.py

# Run with coverage
uv run pytest tests/test_agents.py --cov=bmlibrarian.agents

# Run specific test classes
uv run pytest tests/test_query_agent.py::TestQueryAgent

Unit Testing Approach

  1. Mock Ollama Dependencies - No external dependencies for testing
  2. Comprehensive Coverage - All public methods tested
  3. Integration Testing - Optional with real database

Security and Privacy

Input Validation

  • Non-empty input requirements
  • Type checking for complex parameters
  • SQL injection prevention through parameterized queries

Local Processing

  • All AI processing happens locally through Ollama
  • No data sent to external services
  • User questions and document content remain private