When working with the Data Exports product, there are a handful of best practices that we recommend following for the simplest and most straightforward integration of the product and usage of the data.

Identifiers

The IDs on all associated resources point to the same identifiers in Orb’s system. For example, the invoice_id field in the Daily Line Item Revenue resource points to the row in the Invoice Metadata resource with that ID. This ID also points to the same invoice in Orb’s system (e.g. through querying in the API).

Data updates

Updates in the data export resources are tracked with the updated_at column on each row. This timestamp is used internally by Orb’s system to track when data needs to be exported to data destinations. This timestamp is not an indicator of when the underlying resource was updated in Orb’s system.

Deletion markers

Orb exports deletion markers to indicate that a resource has been deleted. In general, the majority of Orb’s resources will have a marker, deleted_at, which will be set to the time of deletion, or null if the resource has not been deleted. Certain resources, are append-only and will never be deleted, these do not have a deleted_at field present.

The following resources do not have a deletion marker:

  • Credit Ledger Entry
  • Daily Line Item Revenue
  • Invoice
  • Invoice Line Item Billing

Webhooks

To help gain better visibility into the operational status of data export process, Orb sends webhooks about transfer successes and failures. These webhooks are documented in Orb’s webhooks documentation and can be used to instruct downstream processes.

For example, you may need to run a downstream job on the raw Orb data to transform and join it against other data sources in your system. This can easily be implemented by listening for the data_exports.transfer_success webhook.

On the flip side, if you want to be notified when transfers fail, the corresponding data_exports.transfer_error webhook can be used.

Revert and void timestamps

Both the Daily Line Item Revenue and Credit Ledger Entry resources contain a reverted_at column which indicates when the underlying entry was reverted. This can happen when changes occur in the past (e.g. a subscription cancellation is backdated) that undo operations that have been recorded. In general, we recommend filtering out rows that were reverted when trying to understand the current state. These columns can be used to construct point-in-time queries to understand the state at a given time, by filtering to rows that were active at the time in question.

The Invoice Line Item Billing column contains a similar column for voided data called voided_at. This column can be used to help understand data that has been voided. This can happen when an invoice was originally sent but was later voided. As with reverted data, we recommend filtering out voided data for most queries, but your exact reporting needs may differ.

Append-only destinations

Orb currently supports S3 as a destination to send Orb’s data to. Since S3 is immutable, changes to data are sent as new records instead of updating the existing records because there is no concept of an “update” in S3, only a create. For customers using S3 as a data destination, this means that you will need to deduplicate the data on your end to avoid showing duplicate information.

This can be achieved by fetching the latest entry for a given id based on the updated_at timestamp.

Below is a sample snippet of SQL to achieve that:

Deduping data from S3
WITH ordered AS (
  SELECT *
       , ROW_NUMBER() OVER (PARTITION BY id ORDER BY updated_at DESC) AS row_number

  FROM <resource>
)

SELECT *
FROM ordered
WHERE row_number = 1;