Skip to main content

Scaling Guide

Contract Lucidity is designed to scale from a single-server demo to an enterprise deployment serving thousands of users. This guide covers vertical scaling, horizontal scaling, and the architectural patterns that support each.

Architecture Overview

Vertical Scaling

The simplest way to increase capacity: give your existing server more resources.

Worker Concurrency

The most impactful vertical scaling lever is CELERY_CONCURRENCY -- the number of document processing threads running in the worker container. This is set as an environment variable (default: 2).

# In .env or docker-compose.yml
CELERY_CONCURRENCY=8

Or override in docker-compose.yml:

cl-worker:
command: celery -A app.celery_app worker --loglevel=info --concurrency=8

After changing, restart the worker:

docker compose restart cl-worker

Concurrency Sizing Matrix

Celery WorkersRAMCPUsTypical ThroughputUse Case
24 GB2~5 docs/hourDemo / small team (< 10 users)
48 GB2-4~12 docs/hourSmall firm (10-50 users)
816 GB4-8~25 docs/hourMid-size firm (50-200 users)
1632 GB8+~50 docs/hourAm Law 200 (200-500 users)
32+64 GB+16+~100+ docs/hourAm Law 100 (500+ users)
Throughput Depends on AI Provider

The throughput numbers above assume your AI provider's rate limits can sustain the load. Each document makes 3-6 AI API calls. At 16 concurrent workers, you need at minimum ~100 RPM from your AI provider. See the AI Provider docs for rate limit details per tier.

Memory Considerations

Each Celery worker process consumes approximately:

ComponentMemory per Worker
Base Python process~150 MB
Document text in memory~10-50 MB (depends on document size)
AI SDK overhead~50 MB
Total per worker~250-350 MB

Formula: Required RAM = (CELERY_CONCURRENCY * 350 MB) + 2 GB (OS + other containers)

For example, 8 workers: (8 * 350) + 2000 = 4800 MB ~ 5 GB minimum

warning

Setting CELERY_CONCURRENCY higher than your available CPU cores will cause contention and may slow down processing rather than speed it up. The extraction stage (OCR via Tesseract) is CPU-intensive.

Horizontal Scaling

When a single server reaches its limits, scale horizontally by adding more instances.

Multiple Worker Instances

The easiest horizontal scaling path. Celery workers are stateless and compete for tasks from the same Redis queue.

# docker-compose.override.yml for multiple workers
services:
cl-worker-1:
extends:
service: cl-worker
container_name: cl-worker-1
environment:
- CELERY_CONCURRENCY=8

cl-worker-2:
extends:
service: cl-worker
container_name: cl-worker-2
environment:
- CELERY_CONCURRENCY=8

cl-worker-3:
extends:
service: cl-worker
container_name: cl-worker-3
environment:
- CELERY_CONCURRENCY=8

Requirements for multi-worker scaling:

  • All workers must share the same Redis instance (broker)
  • All workers must share the same PostgreSQL database
  • All workers must have access to the same document storage volume (/data/storage)
Shared Storage is Critical

If workers cannot access the same /data/storage path, the extraction stage will fail with "Package not found at /data/storage/...". Use NFS, EFS (AWS), Azure Files, or a similar shared filesystem.

Multiple Backend Instances

The backend is stateless (sessions use JWT tokens, not server-side state). Add instances behind a load balancer:

services:
cl-backend-1:
extends:
service: cl-backend
container_name: cl-backend-1

cl-backend-2:
extends:
service: cl-backend
container_name: cl-backend-2
Migration Safety

When running multiple backend instances, only one should run database migrations on startup. Use a leader election mechanism or run migrations manually before scaling:

docker exec cl-backend-1 alembic upgrade head

Then start additional instances with migrations disabled (or accept that redundant migration runs are safe -- Alembic uses a version table to prevent re-running).

Frontend Scaling

The Next.js frontend is stateless. Scale by adding instances behind a load balancer:

services:
cl-frontend-1:
extends:
service: cl-frontend
container_name: cl-frontend-1
ports:
- "3001:3000"

cl-frontend-2:
extends:
service: cl-frontend
container_name: cl-frontend-2
ports:
- "3002:3000"

Place behind a reverse proxy (Nginx, Caddy, Traefik) or cloud load balancer.

Database Scaling

Connection Pooling

As you add backend and worker instances, database connections multiply. PostgreSQL's default max_connections (100) can be exhausted.

Options:

  1. Increase max_connections in PostgreSQL config (simple but limited)
  2. Use PgBouncer as a connection pooler (recommended for > 8 total service instances)
# Add PgBouncer to docker-compose
cl-pgbouncer:
image: edoburu/pgbouncer:latest
container_name: cl-pgbouncer
environment:
DATABASE_URL: "postgresql://cl_user:cl_password_change_me@cl-postgres:5432/contract_lucidity"
MAX_CLIENT_CONN: 500
DEFAULT_POOL_SIZE: 25
POOL_MODE: transaction
ports:
- "6432:6432"
depends_on:
- cl-postgres
networks:
- cl-network

Then point POSTGRES_HOST=cl-pgbouncer and POSTGRES_PORT=6432 in your .env.

Read Replicas

For read-heavy workloads (large teams viewing documents simultaneously), offload read queries to PostgreSQL replicas:

info

Read replicas require application-level routing (separate connection strings for reads vs writes). This is not currently built into CL but can be implemented with a PostgreSQL proxy like PgPool-II or at the infrastructure level with AWS RDS read replicas or Azure read replicas.

Cloud-Specific Scaling Patterns

AWS

ComponentServiceScaling Method
FrontendECS Fargate / EKSAuto-scaling based on CPU
BackendECS Fargate / EKSAuto-scaling based on request count
WorkerECS Fargate / EKSAuto-scaling based on Redis queue depth
DatabaseRDS PostgreSQLVertical (instance class) + read replicas
StorageEFSAutomatic (shared across instances)
RedisElastiCacheVertical (node type)

Azure

ComponentServiceScaling Method
FrontendAzure Container AppsAuto-scaling based on HTTP traffic
BackendAzure Container AppsAuto-scaling based on HTTP traffic
WorkerAzure Container AppsKEDA scaling based on Redis queue length
DatabaseAzure Database for PostgreSQL Flexible ServerVertical + read replicas
StorageAzure Files (Premium)Shared across instances
RedisAzure Cache for RedisVertical (tier)

Kubernetes (Any Cloud)

# HPA for worker pods based on Redis queue depth
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: cl-worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: cl-worker
minReplicas: 2
maxReplicas: 20
metrics:
- type: External
external:
metric:
name: redis_celery_queue_length
target:
type: AverageValue
averageValue: "5"

Scaling Decision Flowchart

Benchmarking

Before scaling, establish baselines:

# Measure pipeline throughput
# Upload 10 test documents and measure total time
START=$(date +%s)
# ... upload documents ...
# ... wait for all to complete ...
END=$(date +%s)
echo "Throughput: 10 documents in $((END-START)) seconds"

# Monitor during load test
docker stats --no-stream --filter "name=cl-"
MetricHow to MeasureTarget
Pipeline throughputDocuments completed per hourScales linearly with workers
API response time (p95)Load testing with k6/vegeta< 500ms for read endpoints
Time to first resultUpload to COMPLETE< 3 min for a 20-page document
Concurrent usersLoad test with realistic browsingScale frontend/backend instances