Skip to main content

Reliability and scaling

Orb's API is architected for high availability and low latency, and is designed with the following safety measures:

  1. For customers with enterprise agreements, dedicated capacity can be provisioned. This ensures that we are able to provide very strict SLAs, and further isolates other accounts from potential load volatility.
  2. The API separates out workloads into their own clusters (e.g. ingestion) to ensure that critical actions are given precedence and able to proceed.
  3. Orb uses API firewalls to detect anomalous traffic to ensure that clients which are not behaving as intended cannot cause broader side-effects to multi-tenant environments.

The following table provides some recommendations on how to approach API failures depending on the category of workload:

CategoryReliability framework
Event ingestionFor information about event ingestion throughput, see the guide on high throughput ingestion. For a production integration, Orb highly recommends that you do not pass debug=True in the ingestion endpoint. Although exceedingly rare, any 5XX errors should be retried with exponential backoff to ensure event delivery. Note that Orb's reporting grace period allows you to safely send events many hours late, depending on your account's configuration.
Read-only queriesRead-only queries (such as invoice preview or customer costs) may be used either in an end-user facing callsite such as your application's usage dashboard or an asynchronous service that continuously exports data. Orb provides an uptime SLA on these queries for enterprise agreements. If you prefer denormalizing this data in your datastore for local access when it's required, consider a service that refreshes these values in the background. If you're not on a provisioned Orb cluster, you may see rare timeouts on these read queries which should be retried.
Critical write actionsActions like creating a subscription or a customer are mission critical and most often are called in the critical path of your application. Orb takes any errors on these endpoints extremely seriously, and also provides strict uptime SLAs for enterprise customers. We recommend you carefully consider whether you should allow users to proceed in the rare event that there are errors on these endpoints, thinking through any implications on data consistency in your data model.