Architecture¶
BMLibrarian is a modular Python library for biomedical literature database management with a migration system ensuring database schema consistency across environments and versions.
System Overview¶
graph TB
subgraph "User Interfaces"
GUI[Qt GUI]
CLI[CLI Tools]
end
subgraph "Core Services"
WF[Workflow Engine]
AG[Agent Orchestrator]
CFG[Configuration]
end
subgraph "AI Agents"
QA[QueryAgent]
SA[ScoringAgent]
CA[CitationAgent]
RA[ReportingAgent]
CFA[CounterfactualAgent]
end
subgraph "Data Layer"
DB[(PostgreSQL)]
VEC[pgvector]
MIG[Migrations]
end
subgraph "External"
OL[Ollama LLM]
end
GUI --> WF
CLI --> WF
WF --> AG
AG --> QA
AG --> SA
AG --> CA
AG --> RA
AG --> CFA
QA --> DB
SA --> DB
CA --> DB
QA --> OL
SA --> OL
CA --> OL
RA --> OL
CFA --> OL
DB --> VEC
DB --> MIG
CFG --> AG
Key Components¶
1. MigrationManager¶
Location: src/bmlibrarian/migrations.py
The core handler for database schema management:
- Database connection management
- Migration discovery and validation
- Schema application and tracking
- Checksum verification
from bmlibrarian.migrations import MigrationManager
manager = MigrationManager(
host="localhost",
port=5432,
user="username",
password="password",
database="knowledgebase"
)
# Initialize new database
manager.initialize_database()
# Apply pending migrations
count = manager.apply_pending_migrations()
2. CLI Interface¶
Location: src/bmlibrarian/cli.py
Command-line tools providing:
migrate init- Initialize database with baselinemigrate apply- Apply pending migrations- Argument parsing with validation
- Environment variable integration
# Initialize database
bmlibrarian migrate init --database knowledgebase
# Apply migrations
bmlibrarian migrate apply
3. App Integration¶
Location: src/bmlibrarian/app.py
Application-level functions:
initialize_app()- Auto-apply migrations at startupget_database_connection()- Get configured connection
from bmlibrarian.app import initialize_app, get_database_connection
# Initialize on startup
initialize_app()
# Get database connection
conn = get_database_connection()
Design Principles¶
Idempotent Operations¶
All migration operations can be safely repeated:
Atomic Transactions¶
Each migration is a complete unit:
# Migration either fully succeeds or fully rolls back
try:
manager.apply_migration(migration_file)
except Exception:
# Transaction automatically rolled back
pass
Checksum Validation¶
Applied migrations cannot be modified:
Ordered Execution¶
Migrations use numeric prefixes for ordering:
Database Foundation¶
Migration Tracking Table¶
CREATE TABLE bmlibrarian_migrations (
id SERIAL PRIMARY KEY,
filename VARCHAR(255) NOT NULL UNIQUE,
checksum VARCHAR(64) NOT NULL,
applied_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Features:
- Prevents duplicate application
- Maintains application order history
- Detects unauthorized modifications
Required Extensions¶
CREATE EXTENSION IF NOT EXISTS pgvector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
Security Model¶
Least Privilege¶
Minimal required permissions for operations:
-- Application user permissions
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES TO app_user;
GRANT USAGE ON ALL SEQUENCES TO app_user;
Credential Protection¶
Environment variables for sensitive data:
SQL Injection Prevention¶
Parameterized queries throughout:
Module Dependencies¶
graph LR
CLI[cli.py] --> MIG[migrations.py]
APP[app.py] --> MIG
MIG --> DB[(PostgreSQL)]
CLI --> ENV[Environment]
APP --> ENV
Testing Requirements¶
- 98%+ code coverage target
- Unit tests with mocked dependencies
- Integration tests with real PostgreSQL
- Performance benchmarks for large datasets
Future Roadmap¶
| Feature | Status | Description |
|---|---|---|
| Rollback support | Planned | Reverse migration capability |
| Parallel execution | Planned | Concurrent migration application |
| Configuration files | Planned | YAML/JSON configuration |
| Migration generation | Planned | Auto-generate from schema diff |