The Problem
In Q4 2025, a mid-sized Indonesian e-commerce client came to us with a growing infrastructure bill. Their monthly cloud spend had ballooned to $4,200/month as their team had provisioned servers reactively — always scaling up, never scaling right.
What We Found in the First Week
After getting monitoring access, the picture was clear:
- CPU utilization averaged 12% across 8 application servers. They were paying for capacity they never used.
- 3 servers had no traffic for 45+ days — forgotten staging environments still running in production accounts.
- Database queries were hitting an unindexed
orderstable with 2.3 million rows — every checkout page load triggered a full table scan. - Static assets (images, CSS, JS) were being served from the application server instead of a CDN.
- Backups were running during peak hours (11am-2pm), causing I/O spikes on the primary database.
The 30-Day Optimization Plan
Week 1: Quick Wins
Shut down the 3 idle servers — immediate $840/month saving. Moved static assets to Cloudflare R2 with CDN — reduced bandwidth costs by 70% and cut page load time by 1.4 seconds.
Week 2: Right-sizing
Replaced 8 over-provisioned servers with 4 properly-sized ones using load testing data. Configured horizontal auto-scaling — the cluster now scales from 2 to 6 nodes based on actual traffic patterns.
Week 3: Database Optimization
Added composite indexes on the orders table:
`sql
CREATE INDEX idx_orders_user_status_created
ON orders (user_id, status, created_at DESC);
`
Checkout page load time dropped from 4.2s to 0.8s. Moved backup jobs to 2am.
Week 4: Monitoring and Alerting
Deployed Prometheus + Grafana dashboards. Set up cost anomaly alerts — any daily spend 20% above baseline triggers an immediate notification.
The Result
| Metric | Before | After |
|--------|--------|-------|
| Monthly spend | $4,200 | $1,680 |
| Avg CPU utilization | 12% | 68% |
| Checkout load time | 4.2s | 0.8s |
| Server count | 11 | 4 (auto-scaling) |
60% cost reduction. Better performance. Smaller attack surface.
This is exactly what our Proactive Management tier includes — systematic infrastructure review, not just keeping the lights on.