SpectreDev | High-Performance Systems Engineering Alternative: SpectreDev

// PUBLISHED04.05.26

// TIME8 MINS

// TAGS

#VENDOR SELECTION#CTO GUIDE#INDONESIA MARKET

// AUTHOR

Spectre Command

vendor promises 99.9% uptime. You sign the contract. Six months later, your system goes down during a GoPay settlement window at 9 PM on a Friday, and you lose two hours of transactions. You raise the issue. The vendor confirms the outage was 112 minutes. They calculate the SLA credit: around Rp 800,000 against a Rp 120 million monthly contract.

That's the SLA uptime guarantee working exactly as written — and completely failing you as a business.

This is the gap most non-technical founders don't see until they're in it. An SLA uptime guarantee in a vendor contract is not the same as a commitment to reliability. It's a legal instrument. It protects the vendor more than it protects you. And if your CTO doesn't know how to rewrite that section before signing, you're exposed.

Here's what actually matters — and what to demand instead.

The Difference Between SLA, SLO, and SLI (And Why Most Vendors Only Give You One)

These three terms get used interchangeably in vendor conversations. They're not the same thing.

An SLI (Service Level Indicator) is the raw measurement. Response time, error rate, throughput, availability — a number you can observe. It's the metric itself.

An SLO (Service Level Objective) is the internal target your vendor sets for that metric. "We aim for 99.95% availability measured monthly, excluding scheduled maintenance." It's what they're trying to achieve. The SLO lives inside their engineering culture. You may never see it.

An SLA (Service Level Agreement) is the contractual commitment — and usually the weakest of the three. SLAs are written by lawyers, not engineers. They include exclusions, carve-outs, and remedies (usually credits) that are calibrated to be legally defensible rather than operationally meaningful.

The mistake most buyers make is treating the SLA as the reliability commitment. It isn't. The SLO is the reliability commitment. The SLA is the consequence structure when the SLO is missed.

A vendor who can't show you their SLOs — the internal targets that drive their engineering decisions — is a vendor who hasn't thought seriously about reliability. That should concern you.

What "99.9% Uptime" Actually Means in Practice

The maths on uptime percentages are worth running once so you understand what you're agreeing to.

99.9% uptime allows for 8.7 hours of downtime per year, or roughly 43 minutes per month. That sounds fine until your system goes down at 2 PM on a weekday and stays down for 40 minutes while your support team fields calls and your operations team manually processes orders.

99.95% cuts that to about 21 minutes per month. 99.99% ("four nines") allows 4.3 minutes per month. The difference between 99.9% and 99.99% isn't a rounding error — it's an order of magnitude in allowed downtime, and it reflects entirely different engineering investments.

Most Indonesian software vendors offer 99.9% as standard. Some offer 99.95%. Very few offer 99.99%, and when they do, it's usually for specific services (like an API gateway), not the full stack.

The counter-intuitive point here: chasing four nines for a system that doesn't need it is expensive and often counterproductive. A seed-stage startup probably doesn't need 99.99% uptime. An established fintech processing real-money transactions probably does. What your CTO needs to negotiate is not the highest number available — it's the right number for your actual business risk.

The Exclusions That Nullify Most Uptime Guarantees

Uptime SLAs almost always include exclusions. These are the clauses that let a vendor claim they met their commitments even when your system was down. The most common ones:

Scheduled maintenance windows. Vendors can typically schedule downtime with advance notice (usually 48–72 hours) and exclude it from uptime calculations. Some contracts allow surprisingly long or frequent maintenance windows. Read this carefully.

Third-party service failures. If your system goes down because AWS had an outage, or because Midtrans (a common Indonesian payment gateway) had an incident, most SLAs exclude this. Your vendor's uptime was fine; you just had no system. This exclusion is often legitimate, but you need to know it exists.

Incidents caused by client actions. If your team pushes a bad configuration or an untested deployment causes the outage, most vendors exclude that. Again, sometimes fair. But "client-caused" can be interpreted broadly by a vendor trying to avoid a credit.

Force majeure. Broad clauses covering anything from natural disasters to internet infrastructure failures. Fine in principle, but watch for contracts where this clause is so wide it covers almost any external event.

The test: ask your vendor to walk you through the last three times they triggered an exclusion clause with a client. If they can't recall any — or if they've never had an incident that reached that level of scrutiny — that's not confidence-inspiring.

What to Actually Demand: A CTO's Non-Negotiables

These are the contract provisions worth pushing for, beyond the headline uptime number.

Composite SLOs, not just availability. Availability tells you the system was up. It doesn't tell you it was usable. Push for SLOs on latency (e.g. p95 response time under 500ms), error rate (e.g. less than 0.1% 5xx errors), and throughput under load. A system that's "available" but returning errors 15% of the time is not serving your users.

Measurement methodology. How is uptime calculated? Synthetic monitoring that pings a health endpoint every five minutes will miss short but painful outages. Real user monitoring or application performance monitoring tools give you a truer picture. Ask what tool the vendor uses, who has access to the dashboard, and whether you can integrate it with your own observability stack.

Mean Time to Acknowledge (MTTA) and Mean Time to Resolve (MTTR). Uptime SLAs measure outcomes. MTTA and MTTR measure the vendor's incident response behaviour. "We'll acknowledge a P1 incident within 15 minutes and resolve within 4 hours" is a more meaningful operational commitment than a monthly availability percentage.

Incident communication protocol. Who calls you when something goes down at 11 PM? Through what channel? How often do you get updates? A vendor who doesn't have a defined escalation path for critical incidents is improvising their incident response — which is exactly when you don't want improvisation.

Credit structure that creates actual incentive. Standard SLA credits (typically 5–10% of monthly fees per percentage point of downtime missed) are rarely painful enough to change vendor behaviour. Push for escalating credits tied to MTTR, or for a termination right if SLAs are missed more than twice in a rolling six-month period. Vendors who are confident in their reliability won't object to this.

A Real Scenario: What Good SLA Negotiation Looks Like

A Series A fintech startup in Surabaya was preparing to sign with a managed infrastructure vendor. The vendor's standard contract offered 99.9% monthly availability with standard exclusions and a 10% monthly fee credit for breaches.

Their CTO pushed back on three things. First, the measurement methodology — the vendor was using a five-minute ping check. They agreed to switch to a real-user monitoring baseline with 60-second check intervals. Second, the credit structure — they negotiated an escalating credit up to 30% of monthly fees for sustained outages over 4 hours, and a right to terminate with 30 days notice if availability fell below 99.5% in any two consecutive months. Third, they added an explicit MTTA commitment of 10 minutes for P1 incidents, with documented escalation contacts at the vendor.

The negotiation took an extra two weeks. The vendor accepted all three points with minor modifications. That CTO told us later: "The vendor didn't push back much. I think they were just not used to a client who actually read the contract."

Most vendors will move on SLA terms if you ask. The problem is most buyers don't ask.

FAQ

Q: What's a reasonable uptime SLA to expect from a software development vendor in Indonesia?

A: For most application workloads, 99.9% monthly availability is the baseline. For production systems handling financial transactions or high user volume, push for 99.95%. Four nines (99.99%) is achievable but requires significant infrastructure investment and should only be demanded where business risk genuinely justifies it. More important than the number is how it's measured and what the remedies look like.

Q: What's the difference between an SLA and an SLO — which one should I focus on?

A: The SLA is what's in your contract. The SLO is the internal target your vendor is actually engineering toward. Ask to see both. If a vendor doesn't have documented SLOs for the services they're building you, that's a signal their reliability practice is not mature. The SLO tells you what they're trying to achieve; the SLA tells you what happens when they don't.

Q: Can I negotiate SLA terms with Indonesian software development vendors?

A: Yes, more often than founders expect. Most vendors have a standard contract template that their sales team presents as non-negotiable. It usually isn't. MTTA/MTTR commitments, measurement methodology, credit escalation structures, and termination rights are all negotiable if you ask explicitly and tie the negotiation to your technical requirements.

Q: What should an incident response protocol look like in a vendor contract?

A: At minimum: defined incident severity levels (P1/P2/P3) with clear criteria, MTTA commitment per level, update frequency during active incidents, post-incident report requirement (usually within 48–72 hours of resolution), and named escalation contacts on the vendor side. Anything less than this is an informal arrangement that won't hold up under pressure.

Q: What happens in Indonesia if a vendor consistently misses their SLA?

A: Credits are the standard contractual remedy — rarely sufficient on their own. If you've negotiated termination rights tied to repeated SLA misses, that's your most meaningful leverage. Indonesian contract law (Kitab Undang-Undang Hukum Perdata) supports breach of contract claims, but litigation is slow and rarely the right tool for a startup. The better protection is negotiating strong remedies and termination rights upfront, before you need them.

SLAs are not reliability. They're the fallback when reliability fails. A vendor with a genuinely strong reliability culture won't hide their SLOs, won't resist MTTA commitments, and won't object to termination rights — because they expect to meet them. The vendors who fight hardest to keep their SLA vague are telling you something important about what they expect their performance to look like.

If you're evaluating vendors and need a broader framework for the full selection process, [→ Read: How to choose a software development company in Indonesia without getting burned] covers the complete picture. And once you're past vendor selection and into scaling, [→ Read: How to build a backend that scales from 100 to 10 million users] is where the operational conversation continues.

Internal Reference Logs:

External Documentation:

[KOMINFO on digital service obligations] — Indonesian digital infrastructure and regulatory compliance guidelines.
[Google SRE Book on SLOs] — authoritative reference for site reliability engineering metrics.