Architecture Optimization: Evaluating the Impact of Horizontal Scaling and Caching on System Throughput

February 5, 2026

Evaluate the impact of horizontal scaling and Redis caching on system performance through hands-on testing. This article shares the complete testing process, pitfalls encountered, and real benchmark data comparisons to help you make data-driven decisions when facing scalability challenges.

Introduction

🔗

As system traffic grows, should you add more machines or introduce caching? To evaluate which approach is more cost-effective, I built a Scaling Benchmark Project. By constraining hardware resources, the project simulates how a monolithic architecture can improve system throughput efficiently and at lower cost.

Goals

🔗

This project implements two types of APIs: CPU-intensive and DB-intensive, to simulate performance bottlenecks from different dimensions (slow computation vs. slow queries). The testing process uses k6 for load testing, combined with a monitoring stack of cAdvisor + Prometheus + Grafana, and leverages a Grafana dashboard (ID: 14282) to precisely observe CPU utilization of each container.

Initial baseline tests showed that the database connection pool configuration was the primary bottleneck, which could be further analyzed via postgres-exporter. However, to reduce variables and ensure stability of the simulated environment, the current implementation keeps the database connection settings at their defaults and focuses on analyzing baseline performance, caching, and horizontal scaling.

Server Specifications

🔗

Processor: 4 cores
RAM: 8 GB
Disk Space: 100 GB
Operating System: Ubuntu Server 22.04 LTS

It is strongly recommended to use a Linux system. I initially ran Docker on Windows, but due to the extra abstraction layer, Grafana was unable to correctly read container names.

Tech Stack

🔗

Application Layer: Node.js + Express
Database: PostgreSQL (preloaded with 10,000 user records)
Cache Layer: Redis
Load Testing: k6
Monitoring: Prometheus + Grafana + cAdvisor + InfluxDB

Container Responsibilities

🔗

Refer to the docker-compose.yml file in the project.

app (server.js)

🔗

Uses depends_on and healthcheck to ensure postgres and redis are ready before startup.

Load Simulation Endpoints:

CPU-intensive (/api/api-intensive)

Behavior: Executes high-computation tasks (e.g., complex calculations), no database or cache involved
Purpose: Simulate compute bottlenecks and verify the effect of horizontal scaling

DB-intensive (/api/db-intensive)

Behavior: Randomly generate a Key (1-100) and determine if it exists in the cache (frequency adjustable based on load test duration). If not, perform a complex database operation.
Caching strategy:
- Redis Check: Verify whether data exists in cache
- Cache Miss: Execute a complex JOIN query and write back to Redis (TTL 60s)
- Cache Hit: Return data directly from Redis
Purpose: Simulate a database computational bottleneck and verify the effectiveness of Redis Caching in protecting the database under high concurrency.

Storage & Cache

🔗

PostgreSQL: Primary data store responsible for persistence and complex queries
Redis: Cache layer to reduce database read pressure

Testing Tool

🔗

k6

🔗

Traffic Distribution: Simulates realistic traffic with 50% CPU-intensive and 50% DB-intensive requests
Data Flow: Test results are written to InfluxDB in real time for time-series analysis in Grafana

Test script path: load-tests/baseline-test.js

Monitoring Stack

🔗

Component	Target	Description
Prometheus	System-wide metrics	Scrapes and stores time-series metrics from exporters
cAdvisor	Container resources	Collects CPU, memory, network, and disk I/O usage for all containers
Postgres-exporter	DB runtime state	Exposes PostgreSQL internals (e.g., slow queries, connections)
InfluxDB	Load test metrics	Stores k6 metrics (RPS, latency, etc.)
Grafana	Visualization	Aggregates all data sources into dashboards

Data Flow Diagram

🔗

       [ Load Testing Layer ]                     [ Application & Database Layer ]
      +-------------+           +------------------+          +------------+
      |     k6      | --------> |    App (xN)      | -------> |  Postgres  |
      |   (Load)    |           |    + Redis       |          |    (DB)    |
      +-------------+           +------------------+          +------------+
             |                           |                           |
    (Push)   |                           | (Stats)                   | (Exporter)
             v                           v                           v
      +-------------+           +------------------+          +-------------------+
      |  InfluxDB   |           |    cAdvisor      |          | Postgres-exporter |
      | (Load Data) |           | (Container Data) |          |  (DB Metrics)     |
      +-------------+           +------------------+          +-------------------+
             |                           |                           |
             |                           +------------+--------------+
             |                                        |
             |                                        | (Pull)
             |                                        v
             |                              +-------------------+
             |                              |    Prometheus     |
             |                              | (System Metrics)  |
             +------------------------------+-------------------+
                               |
                               | (Data Source)
                               v
                    +-----------------------+
                    |       Grafana         |
                    |   (Visualization)     |
                    +-----------------------+

Test Mode Configuration

🔗

Test modes are switched by adjusting environment variables and scaling parameters in docker-compose.yml:

Scenario	USE_REDIS	replicas	Core Objective
Baseline	false	1	Establish baseline and identify original bottlenecks
Cache Only	true	1	Evaluate Redis optimization for DB-intensive requests
Full Scale	true	3	Test combined effects of caching and multi-instance scaling

Test Procedure

🔗

Start all services: postgres, redis, app, influxdb, prometheus, postgres-exporter, cadvisor, grafana.

Start the system and monitoring dashboards
Verify that Grafana is receiving metrics from all containers
Run the k6 load test
Analyze the data 🎉

Most observations are done directly in Grafana. If data is missing, trace from the source: first confirm cAdvisor metrics, then Prometheus ingestion, and finally Grafana dashboard compatibility. This project uses the popular dashboard ID 14282 for container CPU monitoring, but the choice can be adjusted as needed.

Recommended Metrics to Observe

🔗

This is not a professional benchmark 😅, but these are the metrics I found most useful:

Category	Metric	Meaning
Throughput	Requests/sec	Number of requests processed per second
Latency	P50, P95, P99	Response time percentiles
Error Rate	Error Rate	Proportion of failed requests
Resources	CPU, Memory	System resource consumption
Database	Query Time, Connections	Database load
Cache	Hit Rate	Cache efficiency

If you only look at k6 reports (external observation) without correlating internal metrics from cAdvisor/Prometheus, it is difficult to tell whether latency increases are due to inefficient code, resource throttling, or request queueing.

A complete performance evaluation should cross-validate external metrics (k6) with internal system metrics:

Observation Target	Tool	Key Metrics	Purpose
User experience	k6	RPS, P95 Latency	Verify whether performance goals are met
App runtime	cAdvisor	CPU Usage / Throttling	Detect Docker resource limits
Database state	PG Exporter	Active Connections	Identify connection pool bottlenecks
Cache efficiency	Redis	Cache Hit Rate	Confirm DB load reduction

Test Results

🔗

This benchmark simulates 100 concurrent virtual users (VUs), targeting both API-intensive (pure computation) and DB-intensive (database access) paths.

Test Phase	Throughput	P(95) Latency	[API] < 200ms	[DB] < 200ms	Resource Status
1. Baseline	155.54 req/s	1328.01 ms	97.04%	2.32%	DB CPU 45%, App CPU 40%
2. Cache	174.94 req/s	832.00 ms	11.52%	3.96%	DB CPU ↓ to 4%, App CPU ~47%
3. Scaling	456.40 req/s	384.34 ms	63.73%	39.61%	Load shared across 3 instances

Performance Analysis

🔗

Test 1: Identifying the DB Bottleneck (Baseline)

🔗

Pure computation APIs perform well (97% SLA compliance)
DB-related requests have extremely low compliance (2.32%)
P(95) latency reaches 1.3 seconds
Conclusion: The primary bottleneck lies in database I/O or query efficiency

Test 2: Cache Introduction and Resource Contention (Cache)

🔗

Improvements

After introducing Redis, Postgres CPU usage dropped to 4%
Confirms that caching effectively reduces database load

Unexpected Degradation

SLA compliance for API-intensive endpoints:Dropped from 97% to 11.52%

Root Cause Analysis

Bottleneck shifted from DB to App
With DB pressure removed, the App now bears Redis communication, logic checks, and string processing. Under high concurrency, these small overheads accumulate into a new bottleneck.
Event Loop Blocking
Node.js uses a single-threaded event loop. Frequent cache checks and branching logic in DB-intensive requests occupy the main thread, forcing API-intensive tasks to queue, resulting in severe queueing delay.

Test 3: Horizontal Scaling to Remove Blocking (Scaling)

🔗

App scaled to 3 instances, relieving single-thread saturation
Throughput increased to 456 req/s (+194% vs. baseline)
DB-intensive SLA compliance:Increased from 2.32% to 39.61%

Only after removing the App-layer bottleneck could the benefits of caching be fully realized.

💡 Notes on Metrics Coverage

🔗

Although a full monitoring stack was deployed, this phase did not deeply integrate Postgres-exporter internal metrics (e.g., locks, buffer cache hit rate). The analysis primarily relies on k6 reports and basic container metrics from Grafana dashboard ID 14282.

Conclusion

🔗

This optimization exercise clearly demonstrates that performance tuning is never solved by a single technique—it is always a balancing act.

Introducing caching can quickly eliminate database bottlenecks, but as traffic grows, pressure shifts to the application layer, exposing issues such as CPU saturation, threading limits, and connection management. Ultimately, combining caching with horizontal scaling is what allows throughput to grow near-linearly while keeping latency under control.

The main goal of this project was to practice making data-driven technical decisions, rather than stacking technologies at the first sign of performance issues and risking over-engineering. Identify the bottleneck first, then address it directly. For example:

In CPU-bound scenarios, adding Redis provides little benefit—scaling out is more effective
When the database is the bottleneck, caching often delivers far higher ROI than hardware upgrades
Talking about architecture optimization without measurement is mostly guesswork

Hands-on testing makes it clear how complex real systems are. Beyond raw performance numbers, considerations like cache penetration, data consistency, and load balancer strategies also matter—details that are easy to overlook when only looking at diagrams or theory.

Looking back, the biggest takeaway was not “using Redis” or “scaling horizontally,” but establishing a problem-driven decision-making process. Every technical choice should map to a measurable, verifiable bottleneck; otherwise, even mature technologies can simply add unnecessary complexity.

In practice, whether you add caching or machines, system pressure never disappears—it just moves from the database to the application CPU, threads, or connection layer. Performance optimization is not about maximizing a single metric, but about finding an acceptable balance across all resources.

This data-backed process also makes it easier to distinguish necessary complexity from over-design:

CPU-bound workloads gain little from caching
When the DB is the bottleneck, caching yields far better ROI than hardware scaling
Without measurement, architecture optimization is speculation

This approach shifts technical decisions away from intuition or trendy architectures, grounding them instead in real data and verifiable results.

Project

📦 scaling-benchmark

References

System Design Performance Optimization Redis Horizontal Scaling

Alvin

Software engineer who dislikes pointless busyness, enjoys solving problems with logic, and strives to find balance between the blind pursuit of achievements and a relaxed lifestyle.