SpectreDev | High-Performance Systems Engineering Alternative: SpectreDev

// PUBLISHED28.04.26

// TIME12 MINS

// TAGS

#DIGITAL TRANSFORMATION#LEGACY MODERNISATION#INDONESIA MARKET

// AUTHOR

Spectre Command

ojek started as a 20-person call centre dispatching motorcycle couriers around Jakarta. Tokopedia launched in 2009 as a basic online marketplace with a small engineering team. Traveloka was a flight search tool before it was anything else. None of them wrote their first line of code imagining they'd be serving tens of millions of users within a few years.

The architecture decisions that let them scale — and the decisions that nearly broke them — are public knowledge. Their engineering teams have written extensively about what worked, what collapsed, and how they rebuilt on the fly. For founders building in Indonesia today, these aren't just interesting case studies. They're the closest thing you have to a local playbook.

Here's what actually happened, and what it means for you.

They all started as monoliths. Every single one.

This is the first lesson, and it's the one founders most often misunderstand.

Gojek, Tokopedia, Traveloka — all three built monolithic applications in their early years. One codebase. One deployment unit. Everything coupled together. And this was the right call. A monolith is fast to build, easy to reason about, and perfectly adequate when you're trying to find product-market fit and don't know what your system actually needs to do at scale yet.

Tokopedia started as a standard relational setup. Their first database was a free-edition RDBMS, then migrated to as the user base grew. Simple stack, low overhead, fast iteration. Gojek was running a relatively standard backend when they started handling tens of thousands of rides — not hundreds of microservices, not Kafka clusters, not multi-region infrastructure.

The move to distributed systems came later, and it came because the monolith hit real limits — not because someone read a Martin Fowler post and got excited.

The practical takeaway: if you're a Series A company trying to pre-emptively build a Gojek-style microservices architecture because you think you'll need it eventually, you're probably burning engineering time that would be better spent on the product. The transition is hard. Do it when the pain is real, not when you imagine it might be.

[→ Read: Monolith vs modular monolith vs microservices: the honest decision framework]

Gojek: service granularity based on access frequency, not function

When Gojek made the shift to microservices, they made a specific architectural decision that's worth examining closely. Most engineers split services along functional lines — payments is a service, rides is a service, notifications is a service. Gojek's CTO at the time articulated a different heuristic: they looked at microservices in terms of frequency of access rather than functions. Some parts of their systems were accessed two million times a day. High-frequency paths became services. Lower-frequency systems — like their content management — did not.

This matters because microservices introduce overhead. Every service boundary is a network call, a latency addition, a failure point, an operational burden. Splitting a service that handles moderate traffic buys you architectural purity and costs you operational complexity. It's often not a good trade.

The microservices-based architecture gave Gojek the ability to isolate faulty processes. They could shut something down or make it slower while continuing to serve customers — which is the actual value proposition. Fault isolation. Independent scaling. Not organisational tidiness.

At the infrastructure level, Gojek used Google Cloud Platform for their microservices infrastructure, with PostgreSQL databases, Kafka for messaging, and edge proxies. Within one function in one service alone, they were generating billions of events a day, reaching 3TB to 4TB of data daily. That's the scale at which the infrastructure decisions became load-bearing.

For earlier-stage founders: the Gojek lesson isn't "use Kafka." It's "understand which parts of your system are accessed at high frequency, and design accordingly." Most startups have one or two hot paths. Know what they are before you architect anything.

Tokopedia: the monolith-to-microservices trigger was team size, not traffic

Tokopedia's architecture shift is particularly instructive because the trigger wasn't purely technical.

Their development team quadrupled from 300 engineers to over 1,000 in multiple locations. That's when the monolith became a problem. "Tokopedia was growing so big that we needed to move from a monolith into a microservices model," their Security Engineering Lead explained. "With our previous setup, code was being managed on-premises in one place by one team, so everything had to go through them — including deployment. This traditional way of building applications worked for a few hundred developers, but growing to 1,000+ engineers made it impossible."

Read that again. The monolith worked fine for 300 engineers. It broke at 1,000.

This is the real microservices decision criterion that most architecture discussions skip. It's not "we're getting a lot of traffic." It's "we have too many engineers working on the same codebase and they're blocking each other." Traffic is one axis. Team topology is another. Both matter.

The outcome of Tokopedia's shift was measurable: in one day, they were making 200 to 300 pushes to production — something impossible in a coordinated monolith deployment model.

On the data side, Tokopedia went through a classic scaling progression. Free RDBMS → PostgreSQL → a need for distributed SQL as they hit single-node limits. Their technical architects noted that traditional databases have limitations in scaling because they store all the data in a single node. When you're serving 100 million active users across a marketplace with 11 million merchants, that single node becomes your most critical bottleneck.

Tokopedia scaled their Play feature from 55,000 to 1.5 million concurrent users in five weeks using for container orchestration. That's not engineering magic — it's the product of having already containerised services so horizontal scaling is a configuration change rather than an infrastructure rebuild.

Traveloka: rewiring a legacy service without killing the business

Traveloka's engineering blog is less high-profile than Gojek's, but it contains some of the most practically useful writing on system transformation in the Indonesian tech space.

Their issuance service — the system that handles flight and hotel ticket generation — was one of the oldest services in the platform. It was stateful, tightly coupled, and had become a bottleneck that was actively limiting the company's ability to run flash sales. Marketing wanted to do flash promotions. Engineering kept saying no, because the system couldn't handle the traffic spikes.

The team refactored the issuance service to be stateless, enabling them to scale instances as needed. As part of the refactor, they removed unused logic and adhered more closely to the microservice single-responsibility principle. They improved the job-picking process from once per minute to once per 0.5 seconds — a 120x increase in processing rate.

Not long after the refactor was completed, Traveloka ran its first flash sale in years. Even when it received a spike of 10x traffic at one point, the system held up.

The lesson isn't the technical detail. It's the sequence. Traveloka didn't rebuild the issuance service proactively. They rebuilt it when a specific business capability was blocked. The pain was real, the business value was clear, and the engineering investment had a direct return. That's the right frame for legacy system rewrites — not "this code is messy" but "this code is preventing this specific thing we need to do."

[→ Read: How to rewrite your software system without stopping your business]

What all three got right: the principles underneath the stack choices

The specific technology decisions — Kafka vs RabbitMQ, Kubernetes vs bare metal, Postgres vs distributed SQL — matter less than the underlying principles these companies applied. The stack choices are contingent on context. The principles are more portable.

Build for failure, not uptime. Gojek's chaos engineering approach — their internal tool "Loki" randomly terminated virtual machine instances and containers to expose engineers to failures — is a mindset before it's a toolset. Loki could randomly terminate VM instances and containers, exposing engineers to failures more frequently and incentivising them to build resilient services. Systems that get tested in failure conditions are systems that hold up in production incidents.

Infrastructure as code from early. Gojek's Project Olympus reduced the time to provision new infrastructure from weeks to minutes by treating infrastructure as code. When your ops team is filing tickets to spin up a new database cluster, you've already hit a scaling wall — it's just in your deployment pipeline rather than your application layer.

Data infrastructure is a product, not a side effect. Gojek's Data Engineering team operated as an internal B2B SaaS company, measuring success with business metrics like user adoption, retention, and the revenue or cost savings generated per feature. Tokopedia moved from ad-hoc analytics to BigQuery-based data warehousing when the data volume exceeded what a team of six could manage manually with PostgreSQL. The companies that scaled well treated data infrastructure with the same product thinking they applied to customer-facing features.

Team topology drives architecture as much as traffic does. The Tokopedia example makes this explicit. Conway's Law is real: your system's architecture tends to mirror your organisation's communication structure. If you're planning to scale your engineering team significantly, the architecture conversation should happen alongside the hiring conversation.

What founders often get wrong when drawing lessons from unicorns

Here's the counter-intuitive point that often gets missed in these case studies.

Gojek, Tokopedia, and Traveloka built complex distributed infrastructure because they genuinely needed it. They had hundreds of engineers, hundreds of millions of users, and businesses where a single hour of downtime translated to measurable revenue loss and reputational damage. The complexity was load-bearing.

Most startups don't have this problem. If you're running at Rp 10 billion ARR with a 15-person engineering team, a well-structured monolith or modular monolith will serve you better than 40 microservices that 5 engineers have to keep running. The operational overhead of distributed systems is real, and at small team sizes, it frequently slows development rather than enabling it.

The lesson from the Indonesian unicorns isn't the destination. It's the sequence. They started simple, scaled the architecture when the specific pain demanded it, made decisions based on what their business actually needed, and invested in foundational infrastructure — monitoring, observability, infrastructure as code — before the crisis, not during it.

That last point is the one most worth carrying into your next architecture conversation.

[→ Read: How to build a backend that scales from 100 to 10 million users]

FAQ

Q: Did Gojek, Tokopedia, and Traveloka build custom infrastructure from scratch or use cloud services?

A: A mix of both, which evolved over time. All three used major cloud providers — primarily Google Cloud Platform and AWS — for core infrastructure. But they also built significant internal tooling: Gojek built Heimdall for HTTP resilience, Loki for chaos testing, and Odin/Olympus for infrastructure provisioning. Tokopedia built custom data pipelines and fraud detection tooling. The principle is: use managed services where they exist and are adequate, build custom where the problem is genuinely specific to your scale or operational model. Don't build a custom message queue when Kafka exists.

Q: When should an Indonesian startup consider moving from a monolith to microservices?

A: Three signals that together suggest the time is right: your engineering team is consistently blocked on deployments because multiple teams are pushing to the same codebase; you have one or two specific services that need to scale independently at a cadence your current architecture can't support; and you have enough operational maturity — monitoring, CI/CD, alerting — to manage distributed systems without losing visibility. Moving to microservices before these conditions exist typically creates more problems than it solves.

Q: How did these companies handle database scaling as they grew?

A: The pattern across all three was sequential: start with a single relational database, add read replicas when reads outpace write capacity, add caching layers ( being the consistent choice) to offload hot reads, and eventually move toward distributed databases or database-per-service architectures when single-node limits become a genuine constraint. None of them built a complex distributed database layer from day one. The database scaling decisions followed the traffic reality, not the architectural ambition.

Q: What's the biggest architectural mistake Indonesian startups make when trying to scale?

A: Premature complexity, overwhelmingly. The second-most-common mistake we see is the opposite: ignoring architecture until a crisis forces the conversation, by which point the options are expensive and risky. The narrow path between these is knowing which parts of your system are under genuine stress, investing there deliberately, and leaving the rest alone. The Gojek approach to service granularity — based on access frequency — is a useful heuristic for making that distinction.

Q: Should a funded Indonesian startup hire a senior architect or work with an engineering partner?

A: It depends on the stage and the internal capability. A Seed-stage company rarely has the hiring leverage or the budget for a principal engineer who's run infrastructure at unicorn scale. A Series A or B company with an engineering team that's hitting real architectural pain — crashes under load, deployment bottlenecks, a legacy system that's becoming a liability — is the right moment to either bring in a Staff Engineer with the relevant experience or engage a technical partner who's built at that scale before. What you can't do is assume the problem will resolve itself. Architecture debt compounds quietly until it doesn't.

The companies covered in this post didn't get their architecture right because they were smarter than everyone else. They got it right because they made decisions in the right sequence, were willing to rebuild things that no longer fit their scale, and built foundational infrastructure before it became critical. That sequence — more than any specific technology choice — is what's actually worth learning from.

If your system is already showing the early signs of strain, the time to look at the architecture is before the incident that forces the conversation.

Internal Reference Logs:

External Documentation:

[Gojek Engineering Blog — Data Infrastructure at GO-JEK] — primary source for Gojek data infrastructure details.
[Google Cloud — Tokopedia Case Study] — primary source for Tokopedia scaling and Redis use case.