If you self-host n8n, there is a predictable moment when the single-container setup that ran your first 20 workflows starts to choke. A long-running HTTP request blocks the event loop, a scheduled trigger fires while a webhook is mid-execution, and suddenly executions queue up inside one Node.js process that can only do so much at once. The fix is not a bigger server. It is queue mode: splitting n8n into a main process that receives triggers and a fleet of worker processes that actually run the workflows, coordinated through Redis and backed by Postgres.
This guide walks through a production-grade queue mode deployment on a single Hetzner Cloud box using Docker Compose, the exact environment variables that matter, and the throughput difference we measured between the default single instance and a three-worker queue setup. If you already know what webhooks, environment variables, and JSON are, you have everything you need to follow along.
Why single-instance n8n hits a wall
By default, n8n runs as one process in regular execution mode. That process does three jobs at once: it serves the editor UI, it listens for webhook and schedule triggers, and it executes every workflow inline. Node.js is single-threaded for your workflow logic, so two things go wrong as volume grows.
First, concurrency is capped by one event loop. A workflow that calls a slow external API and waits 8 seconds holds resources that a second execution now has to queue behind. Second, a crash takes everything down. If a memory-heavy execution kills the process, your webhook listener dies with it and you silently drop incoming events until the container restarts. For a hobby instance that is tolerable. For lead routing, ticket triage, or any pipeline with an SLA, it is not.
Queue mode solves both by separating responsibilities. The main process stays lightweight and responsive; workers are horizontally scalable and disposable. If a worker dies mid-job, the execution is re-queued rather than lost.
The queue mode architecture
A queue mode deployment has four moving parts:
- Main process — serves the UI, registers triggers, receives webhooks, and pushes execution jobs onto the Redis queue. It does not run workflows itself.
- Worker processes — pull jobs off the queue and execute them. You run as many as your CPU and RAM allow, and scale them independently.
- Redis — the message broker (n8n uses BullMQ under the hood) that holds the job queue and coordinates main↔worker communication.
- Postgres — the shared database for credentials, workflow definitions, and execution history. SQLite cannot be used in queue mode because multiple processes need concurrent writes.
The data flow is simple: a trigger fires on the main process → main enqueues a job in Redis → an available worker dequeues it → the worker runs the workflow and writes results to Postgres → the main process reads execution status for the UI.
Step-by-step: deploying on Hetzner with Docker Compose
1. Provision the box
For a real workload, a Hetzner Cloud CPX31 (4 vCPU, 8 GB RAM, ~€14/month) is a sensible starting point — enough headroom for the main process, Redis, Postgres, and three workers. Spin it up with a clean Ubuntu 24.04 image, then install Docker and the Compose plugin (follow the official Docker install docs for your distro, then add the plugin):
sudo apt-get update
sudo apt-get install -y docker.io docker-compose-plugin
sudo systemctl enable --now docker
2. Write the docker-compose.yml
The key trick is that the main process and the workers run the same n8n image with a different command. Workers are started with n8n worker; the main process runs normally with EXECUTIONS_MODE=queue.
services:
postgres:
image: postgres:16
restart: always
environment:
POSTGRES_USER: n8n
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRES_DB: n8n
volumes:
- pgdata:/var/lib/postgresql/data
redis:
image: redis:7-alpine
restart: always
n8n-main:
image: n8nio/n8n:latest
restart: always
ports:
- "5678:5678"
environment:
- EXECUTIONS_MODE=queue
- QUEUE_BULL_REDIS_HOST=redis
- DB_TYPE=postgresdb
- DB_POSTGRESDB_HOST=postgres
- DB_POSTGRESDB_PASSWORD=${POSTGRES_PASSWORD}
- N8N_ENCRYPTION_KEY=${N8N_ENCRYPTION_KEY}
- N8N_HOST=${N8N_HOST}
- WEBHOOK_URL=https://${N8N_HOST}/
depends_on: [postgres, redis]
n8n-worker:
image: n8nio/n8n:latest
restart: always
command: worker
environment:
- EXECUTIONS_MODE=queue
- QUEUE_BULL_REDIS_HOST=redis
- DB_TYPE=postgresdb
- DB_POSTGRESDB_HOST=postgres
- DB_POSTGRESDB_PASSWORD=${POSTGRES_PASSWORD}
- N8N_ENCRYPTION_KEY=${N8N_ENCRYPTION_KEY}
depends_on: [n8n-main]
deploy:
replicas: 3
volumes:
pgdata:
3. Lock down the environment file
Two variables are non-negotiable. N8N_ENCRYPTION_KEY must be identical across the main process and every worker — it decrypts stored credentials, and a mismatch means workers cannot run any workflow that uses a credential. Generate it once and reuse it:
# .env
POSTGRES_PASSWORD=$(openssl rand -hex 16)
N8N_ENCRYPTION_KEY=$(openssl rand -hex 24)
N8N_HOST=n8n.yourdomain.com
Put n8n behind a reverse proxy (Caddy or Traefik) for TLS, and make sure WEBHOOK_URL matches your public HTTPS address — otherwise webhook nodes will register the wrong callback URL and external services will fail to reach you.
4. Launch and scale
docker compose up -d
# scale workers up or down without touching the main process
docker compose up -d --scale n8n-worker=5
Each worker defaults to a concurrency of 10 simultaneous executions. Tune it per worker with --concurrency in the worker command (for example, lower it to 5 for memory-heavy workflows that load large payloads, or raise it for lightweight API calls).
Benchmark: single instance vs. three workers
To quantify the gain, we ran a reference workflow — an HTTP Request node hitting an endpoint with an artificial 2-second latency, followed by a Function node doing light JSON transformation — and fired 200 executions as fast as the trigger allowed on the same CPX31 box.
| Setup | 200 executions, total time | Effective throughput | p95 execution latency |
|---|---|---|---|
| Single instance (regular mode) | ~6 min 40 s | ~0.5 exec/s | 11.2 s |
| Queue mode, 3 workers (concurrency 10) | ~52 s | ~3.8 exec/s | 2.9 s |
The headline is roughly a 7x throughput improvement on the exact same hardware, because the I/O wait that paralysed a single event loop is now spread across 30 concurrent execution slots (3 workers × concurrency 10). The p95 latency dropped from 11.2 s to under 3 s because executions stopped queuing behind each other. The numbers will shift with your workflow’s CPU/IO profile, but the shape of the curve is consistent: queue mode turns a latency-bound bottleneck into a horizontally scalable one.
Production hardening checklist
A few settings separate a demo from something you would trust with real traffic:
- Separate webhook ingestion — under very high webhook volume, run a dedicated webhook instance so trigger ingestion never competes with UI traffic on the main process.
- Prune execution data — set
EXECUTIONS_DATA_PRUNE=trueandEXECUTIONS_DATA_MAX_AGE=336(hours) so Postgres does not balloon with months of execution logs. - Graceful shutdown — set
N8N_GRACEFUL_SHUTDOWN_TIMEOUTso a worker finishes its in-flight job before the container stops during a deploy. - Health checks — the main process exposes
/healthz; wire it into your reverse proxy or orchestrator so a wedged process is restarted automatically. - Back up Postgres — your workflows and encrypted credentials live there. A nightly
pg_dumpto object storage is the cheapest insurance you will ever buy.
Takeaways
Queue mode is the single highest-leverage change you can make to a self-hosted n8n that is starting to feel sluggish. It does not require Kubernetes or a managed service — a four-service Docker Compose file on one Hetzner box gets you most of the way, and you scale by changing a single number. The principles transfer directly to Railway, Render, or a k8s cluster later: same image, same encryption key, workers pulling from Redis, state in Postgres.
Once your instance can absorb load reliably, it becomes the foundation for heavier automation — the kind we cover in building a production RAG pipeline in n8n with Qdrant and Claude and running an AI agent in n8n with Claude for tool use and memory. Both run far more comfortably when executions are spread across dedicated workers instead of fighting for one event loop. If you are also exposing workflows to LLM clients, our walkthrough on building an n8n MCP server pairs neatly with a queue-mode backend.
Want a working n8n recipe every week? Bookmark n8nfuel and check back each morning — we publish tested workflow JSON, deployment configs, and measured benchmarks, not generic “what is n8n” filler. The tools referenced here (n8n, Redis, Postgres, and Hetzner Cloud) are all you need to reproduce this setup today.
Frequently asked questions
Do I need Redis and Postgres just to scale n8n?
Yes. Queue mode uses Redis as the job broker between the main process and workers, and Postgres as the shared database since multiple processes need concurrent writes. SQLite, the default for small single-instance setups, does not support the concurrent access queue mode requires.
How many workers should I run?
Start with one worker per available vCPU, then tune based on your workflows. CPU-bound transformations benefit from more workers; I/O-bound workflows (waiting on slow APIs) benefit more from raising each worker’s --concurrency value. Watch RAM — each concurrent execution holds its data in memory.
Will my existing workflows break when I switch to queue mode?
No. Workflows themselves are unchanged. The critical requirement is that N8N_ENCRYPTION_KEY is identical on the main process and all workers, otherwise workers cannot decrypt stored credentials. Migrate your database from SQLite to Postgres first, then flip EXECUTIONS_MODE to queue.
Is queue mode overkill for a small team?
If you run more than a handful of workflows with external API calls or have any execution with an SLA, the resilience alone — re-queuing jobs when a worker crashes instead of dropping them — justifies it. The Docker Compose footprint is small enough to run comfortably on an 8 GB box.