Skip to content

Architecture

BMLibrarian is a modular Python library for biomedical literature database management with a migration system ensuring database schema consistency across environments and versions.

System Overview

graph TB
    subgraph "User Interfaces"
        GUI[Qt GUI]
        CLI[CLI Tools]
    end

    subgraph "Core Services"
        WF[Workflow Engine]
        AG[Agent Orchestrator]
        CFG[Configuration]
    end

    subgraph "AI Agents"
        QA[QueryAgent]
        SA[ScoringAgent]
        CA[CitationAgent]
        RA[ReportingAgent]
        CFA[CounterfactualAgent]
    end

    subgraph "Data Layer"
        DB[(PostgreSQL)]
        VEC[pgvector]
        MIG[Migrations]
    end

    subgraph "External"
        OL[Ollama LLM]
    end

    GUI --> WF
    CLI --> WF
    WF --> AG
    AG --> QA
    AG --> SA
    AG --> CA
    AG --> RA
    AG --> CFA
    QA --> DB
    SA --> DB
    CA --> DB
    QA --> OL
    SA --> OL
    CA --> OL
    RA --> OL
    CFA --> OL
    DB --> VEC
    DB --> MIG
    CFG --> AG

Key Components

1. MigrationManager

Location: src/bmlibrarian/migrations.py

The core handler for database schema management:

  • Database connection management
  • Migration discovery and validation
  • Schema application and tracking
  • Checksum verification
from bmlibrarian.migrations import MigrationManager

manager = MigrationManager(
    host="localhost",
    port=5432,
    user="username",
    password="password",
    database="knowledgebase"
)

# Initialize new database
manager.initialize_database()

# Apply pending migrations
count = manager.apply_pending_migrations()

2. CLI Interface

Location: src/bmlibrarian/cli.py

Command-line tools providing:

  • migrate init - Initialize database with baseline
  • migrate apply - Apply pending migrations
  • Argument parsing with validation
  • Environment variable integration
# Initialize database
bmlibrarian migrate init --database knowledgebase

# Apply migrations
bmlibrarian migrate apply

3. App Integration

Location: src/bmlibrarian/app.py

Application-level functions:

  • initialize_app() - Auto-apply migrations at startup
  • get_database_connection() - Get configured connection
from bmlibrarian.app import initialize_app, get_database_connection

# Initialize on startup
initialize_app()

# Get database connection
conn = get_database_connection()

Design Principles

Idempotent Operations

All migration operations can be safely repeated:

# Safe to run multiple times
manager.apply_pending_migrations()  # Returns 0 if nothing to apply

Atomic Transactions

Each migration is a complete unit:

# Migration either fully succeeds or fully rolls back
try:
    manager.apply_migration(migration_file)
except Exception:
    # Transaction automatically rolled back
    pass

Checksum Validation

Applied migrations cannot be modified:

# Checksums prevent unauthorized modification
# SHA256 hash stored with each applied migration

Ordered Execution

Migrations use numeric prefixes for ordering:

001_baseline_schema.sql
002_add_embeddings.sql
003_add_indexes.sql

Database Foundation

Migration Tracking Table

CREATE TABLE bmlibrarian_migrations (
    id SERIAL PRIMARY KEY,
    filename VARCHAR(255) NOT NULL UNIQUE,
    checksum VARCHAR(64) NOT NULL,
    applied_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Features:

  • Prevents duplicate application
  • Maintains application order history
  • Detects unauthorized modifications

Required Extensions

CREATE EXTENSION IF NOT EXISTS pgvector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

Security Model

Least Privilege

Minimal required permissions for operations:

-- Application user permissions
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES TO app_user;
GRANT USAGE ON ALL SEQUENCES TO app_user;

Credential Protection

Environment variables for sensitive data:

export POSTGRES_USER=username
export POSTGRES_PASSWORD=password

SQL Injection Prevention

Parameterized queries throughout:

cursor.execute(
    "SELECT * FROM articles WHERE pmid = %s",
    (pmid,)
)

Module Dependencies

graph LR
    CLI[cli.py] --> MIG[migrations.py]
    APP[app.py] --> MIG
    MIG --> DB[(PostgreSQL)]
    CLI --> ENV[Environment]
    APP --> ENV

Testing Requirements

  • 98%+ code coverage target
  • Unit tests with mocked dependencies
  • Integration tests with real PostgreSQL
  • Performance benchmarks for large datasets
# Run tests with coverage
uv run pytest tests/ --cov=bmlibrarian --cov-report=html

Future Roadmap

Feature Status Description
Rollback support Planned Reverse migration capability
Parallel execution Planned Concurrent migration application
Configuration files Planned YAML/JSON configuration
Migration generation Planned Auto-generate from schema diff