Architecture Optimization: Evaluating the Impact of Horizontal Scaling and Caching on System Throughput
Introduction
🔗As system traffic grows, should you add more machines or introduce caching? To evaluate which approach is more cost-effective, I built a Scaling Benchmark Project. By constraining hardware resources, the project simulates how a monolithic architecture can improve system throughput efficiently and at lower cost.
Goals
🔗This project implements two types of APIs: CPU-intensive and DB-intensive, to simulate performance bottlenecks from different dimensions (slow computation vs. slow queries). The testing process uses k6 for load testing, combined with a monitoring stack of cAdvisor + Prometheus + Grafana, and leverages a Grafana dashboard (ID: 14282) to precisely observe CPU utilization of each container.
Initial baseline tests showed that the database connection pool configuration was the primary bottleneck, which could be further analyzed via postgres-exporter. However, to reduce variables and ensure stability of the simulated environment, the current implementation keeps the database connection settings at their defaults and focuses on analyzing baseline performance, caching, and horizontal scaling.
Server Specifications
🔗Processor: 4 cores
RAM: 8 GB
Disk Space: 100 GB
Operating System: Ubuntu Server 22.04 LTS
It is strongly recommended to use a Linux system. I initially ran Docker on Windows, but due to the extra abstraction layer, Grafana was unable to correctly read container names.
Tech Stack
🔗- Application Layer: Node.js + Express
- Database: PostgreSQL (preloaded with 10,000 user records)
- Cache Layer: Redis
- Load Testing: k6
- Monitoring: Prometheus + Grafana + cAdvisor + InfluxDB
Container Responsibilities
🔗Refer to the docker-compose.yml file in the project.
app (server.js)
🔗Uses depends_on and healthcheck to ensure postgres and redis are ready before startup.
Load Simulation Endpoints:
- CPU-intensive (
/api/api-intensive)
- Behavior: Executes high-computation tasks (e.g., complex calculations), no database or cache involved
- Purpose: Simulate compute bottlenecks and verify the effect of horizontal scaling
- DB-intensive (
/api/db-intensive)
-
Behavior: Randomly generate a Key (1-100) and determine if it exists in the cache (frequency adjustable based on load test duration). If not, perform a complex database operation.
-
Caching strategy:
- Redis Check: Verify whether data exists in cache
- Cache Miss: Execute a complex JOIN query and write back to Redis (TTL 60s)
- Cache Hit: Return data directly from Redis
-
Purpose: Simulate a database computational bottleneck and verify the effectiveness of Redis Caching in protecting the database under high concurrency.
Storage & Cache
🔗- PostgreSQL: Primary data store responsible for persistence and complex queries
- Redis: Cache layer to reduce database read pressure
Testing Tool
🔗k6
🔗- Traffic Distribution: Simulates realistic traffic with 50% CPU-intensive and 50% DB-intensive requests
- Data Flow: Test results are written to InfluxDB in real time for time-series analysis in Grafana
Test script path: load-tests/baseline-test.js
Monitoring Stack
🔗| Component | Target | Description |
|---|---|---|
| Prometheus | System-wide metrics | Scrapes and stores time-series metrics from exporters |
| cAdvisor | Container resources | Collects CPU, memory, network, and disk I/O usage for all containers |
| Postgres-exporter | DB runtime state | Exposes PostgreSQL internals (e.g., slow queries, connections) |
| InfluxDB | Load test metrics | Stores k6 metrics (RPS, latency, etc.) |
| Grafana | Visualization | Aggregates all data sources into dashboards |
Data Flow Diagram
🔗[ Load Testing Layer ] [ Application & Database Layer ] +-------------+ +------------------+ +------------+ | k6 | --------> | App (xN) | -------> | Postgres | | (Load) | | + Redis | | (DB) | +-------------+ +------------------+ +------------+ | | | (Push) | | (Stats) | (Exporter) v v v +-------------+ +------------------+ +-------------------+ | InfluxDB | | cAdvisor | | Postgres-exporter | | (Load Data) | | (Container Data) | | (DB Metrics) | +-------------+ +------------------+ +-------------------+ | | | | +------------+--------------+ | | | | (Pull) | v | +-------------------+ | | Prometheus | | | (System Metrics) | +------------------------------+-------------------+ | | (Data Source) v +-----------------------+ | Grafana | | (Visualization) | +-----------------------+
Test Mode Configuration
🔗Test modes are switched by adjusting environment variables and scaling parameters in docker-compose.yml:
| Scenario | USE_REDIS | replicas | Core Objective |
|---|---|---|---|
| Baseline | false | 1 | Establish baseline and identify original bottlenecks |
| Cache Only | true | 1 | Evaluate Redis optimization for DB-intensive requests |
| Full Scale | true | 3 | Test combined effects of caching and multi-instance scaling |
Test Procedure
🔗Start all services: postgres, redis, app, influxdb, prometheus, postgres-exporter, cadvisor, grafana.
- Start the system and monitoring dashboards
- Verify that Grafana is receiving metrics from all containers
- Run the k6 load test
- Analyze the data 🎉
Most observations are done directly in Grafana. If data is missing, trace from the source: first confirm cAdvisor metrics, then Prometheus ingestion, and finally Grafana dashboard compatibility. This project uses the popular dashboard ID 14282 for container CPU monitoring, but the choice can be adjusted as needed.
Recommended Metrics to Observe
🔗This is not a professional benchmark 😅, but these are the metrics I found most useful:
| Category | Metric | Meaning |
|---|---|---|
| Throughput | Requests/sec | Number of requests processed per second |
| Latency | P50, P95, P99 | Response time percentiles |
| Error Rate | Error Rate | Proportion of failed requests |
| Resources | CPU, Memory | System resource consumption |
| Database | Query Time, Connections | Database load |
| Cache | Hit Rate | Cache efficiency |
If you only look at k6 reports (external observation) without correlating internal metrics from cAdvisor/Prometheus, it is difficult to tell whether latency increases are due to inefficient code, resource throttling, or request queueing.
A complete performance evaluation should cross-validate external metrics (k6) with internal system metrics:
| Observation Target | Tool | Key Metrics | Purpose |
|---|---|---|---|
| User experience | k6 | RPS, P95 Latency | Verify whether performance goals are met |
| App runtime | cAdvisor | CPU Usage / Throttling | Detect Docker resource limits |
| Database state | PG Exporter | Active Connections | Identify connection pool bottlenecks |
| Cache efficiency | Redis | Cache Hit Rate | Confirm DB load reduction |
Test Results
🔗This benchmark simulates 100 concurrent virtual users (VUs), targeting both API-intensive (pure computation) and DB-intensive (database access) paths.
| Test Phase | Throughput | P(95) Latency | [API] < 200ms | [DB] < 200ms | Resource Status |
|---|---|---|---|---|---|
| 1. Baseline | 155.54 req/s | 1328.01 ms | 97.04% | 2.32% | DB CPU 45%, App CPU 40% |
| 2. Cache | 174.94 req/s | 832.00 ms | 11.52% | 3.96% | DB CPU ↓ to 4%, App CPU ~47% |
| 3. Scaling | 456.40 req/s | 384.34 ms | 63.73% | 39.61% | Load shared across 3 instances |
Performance Analysis
🔗Test 1: Identifying the DB Bottleneck (Baseline)
🔗- Pure computation APIs perform well (97% SLA compliance)
- DB-related requests have extremely low compliance (2.32%)
- P(95) latency reaches 1.3 seconds
- Conclusion: The primary bottleneck lies in database I/O or query efficiency
Test 2: Cache Introduction and Resource Contention (Cache)
🔗- Improvements
- After introducing Redis, Postgres CPU usage dropped to 4%
- Confirms that caching effectively reduces database load
- Unexpected Degradation
- SLA compliance for API-intensive endpoints:Dropped from 97% to 11.52%
- Root Cause Analysis
-
Bottleneck shifted from DB to App
With DB pressure removed, the App now bears Redis communication, logic checks, and string processing. Under high concurrency, these small overheads accumulate into a new bottleneck. -
Event Loop Blocking
Node.js uses a single-threaded event loop. Frequent cache checks and branching logic in DB-intensive requests occupy the main thread, forcing API-intensive tasks to queue, resulting in severe queueing delay.
Test 3: Horizontal Scaling to Remove Blocking (Scaling)
🔗- App scaled to 3 instances, relieving single-thread saturation
- Throughput increased to 456 req/s (+194% vs. baseline)
- DB-intensive SLA compliance:Increased from 2.32% to 39.61%
Only after removing the App-layer bottleneck could the benefits of caching be fully realized.
💡 Notes on Metrics Coverage
🔗Although a full monitoring stack was deployed, this phase did not deeply integrate Postgres-exporter internal metrics (e.g., locks, buffer cache hit rate). The analysis primarily relies on k6 reports and basic container metrics from Grafana dashboard ID 14282.
Conclusion
🔗This optimization exercise clearly demonstrates that performance tuning is never solved by a single technique—it is always a balancing act.
Introducing caching can quickly eliminate database bottlenecks, but as traffic grows, pressure shifts to the application layer, exposing issues such as CPU saturation, threading limits, and connection management. Ultimately, combining caching with horizontal scaling is what allows throughput to grow near-linearly while keeping latency under control.
The main goal of this project was to practice making data-driven technical decisions, rather than stacking technologies at the first sign of performance issues and risking over-engineering. Identify the bottleneck first, then address it directly. For example:
- In CPU-bound scenarios, adding Redis provides little benefit—scaling out is more effective
- When the database is the bottleneck, caching often delivers far higher ROI than hardware upgrades
- Talking about architecture optimization without measurement is mostly guesswork
Hands-on testing makes it clear how complex real systems are. Beyond raw performance numbers, considerations like cache penetration, data consistency, and load balancer strategies also matter—details that are easy to overlook when only looking at diagrams or theory.
Looking back, the biggest takeaway was not “using Redis” or “scaling horizontally,” but establishing a problem-driven decision-making process. Every technical choice should map to a measurable, verifiable bottleneck; otherwise, even mature technologies can simply add unnecessary complexity.
In practice, whether you add caching or machines, system pressure never disappears—it just moves from the database to the application CPU, threads, or connection layer. Performance optimization is not about maximizing a single metric, but about finding an acceptable balance across all resources.
This data-backed process also makes it easier to distinguish necessary complexity from over-design:
- CPU-bound workloads gain little from caching
- When the DB is the bottleneck, caching yields far better ROI than hardware scaling
- Without measurement, architecture optimization is speculation
This approach shifts technical decisions away from intuition or trendy architectures, grounding them instead in real data and verifiable results.
Project
References
- k6 Load Testing Documentation
- Prometheus Best Practices
- Grafana Dashboard Examples
- Docker Compose Networking
System DesignPerformance OptimizationRedisHorizontal Scaling
Alvin
Software engineer who dislikes pointless busyness, enjoys solving problems with logic, and strives to find balance between the blind pursuit of achievements and a relaxed lifestyle.