Skip to main content

Integrate with cloud storage

Orb is able to automatically pull in events from an S3 or Google Cloud Storage bucket that's provisioned for the purposes of an event export.

This is a particularly helpful option if you're executing a regular UNLOAD from a data warehouse, which can directly send event rows to S3 in the required format. Note that the following instructions are for our S3 integration, but the GCS integration is similar.

Event shape

This S3 integration requires that events are in the shape of the ingestion API format, but Orb provides key features to make this more ergonomic:

  • The S3 sync client supports both .jsonl (new line delimited JSON) as well as csv formats.
  • Any top-level fields in the event row that are not a recognized field will automatically be added to the properties dictionary.
  • Orb supports mapping fields present in the events to API field names within the sync, avoiding manual remapping in your pipeline (e.g. if your field is called id but should map to Orb's idempotency_key)

JSONL example

{"idempotency_key": "g7BerX9nQJaohBee2n4gfG", "event_name": "file_processed", "external_customer_id": "mVtTLHyd92vJprC3", "timestamp": "2023-04-03T09:56:50.902462Z", "billable_calls": 13, "api_calls": 200}
{"idempotency_key": "4QiDyKyDpovp9V9Qa9QZaE", "event_name": "file_processed", "external_customer_id": "mVtTLHyd92vJprC3", "timestamp": "2023-04-03T09:56:50.902507Z", "billable_calls": 13, "api_calls": 200}
{"idempotency_key": "2BUUUkoAdgVvedAkW56JkL", "event_name": "file_processed", "external_customer_id": "mVtTLHyd92vJprC3", "timestamp": "2023-04-03T09:56:50.902529Z", "billable_calls": 13, "api_calls": 200}

CSV example

idempotency_key,event_name,external_customer_id,timestamp,billable_calls,api_calls
VScNAmR7n9n8g9A23eLhHk,file_processed,mVtTLHyd92vJprC3,2023-04-03T09:55:04.320106Z,13,200
LiyFeBbcGcvJvUcoYh7gEM,file_processed,mVtTLHyd92vJprC3,2023-04-03T09:55:04.320134Z,13,200
JZbZBzT93ApSrdAaJ23Hwu,file_processed,mVtTLHyd92vJprC3,2023-04-03T09:55:04.320151Z,13,200

S3 sync setup

At a high-level, the S3 sync requires the ARN of two S3 buckets: an events export bucket and a dead-letter queue bucket. You must grant a specific Orb role access to your events export bucket with the following policy:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::970006758186:role/<ORB_PROVISIONED_ROLE>"
},
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<EVENTS_EXPORT_BUCKET>"
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::970006758186:role/<ORB_PROVISIONED_ROLE>"
},
"Action": [
"s3:GetObject",
"s3:GetObjectAcl"
],
"Resource": "arn:aws:s3:::<EVENTS_EXPORT_BUCKET>/*"
}
]
}

Once Orb has the ARN of the events export bucket, Orb will add permissions to a provisioned SQS queue so that the S3 bucket can write to it. Orb uses S3 Event Notifications to listen for new files added to the bucket, and requires the s3:ObjectCreated:* event type which can be configured via the AWS Console. Note that this SQS queue should be in the same region as the bucket, so Orb will provide a provisioned SQS ARN once the bucket region is known.

Dead-letter queue

When Orb encounters a file that cannot be parsed, or an event that contains validation errors, Orb will log any failures to the dead-letter-queue bucket to avoid blocking the rest of the event pipeline.

Allow an Orb role read and write permissions for this bucket by adding a bucket policy, following the outline in this support article:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::970006758186:role/<ORB_PROVISIONED_ROLE>"
},
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<EVENTS_DLQ_BUCKET>"
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::970006758186:role/<ORB_PROVISIONED_ROLE>"
},
"Action": [
"s3:GetObject",
"s3:GetObjectAcl",
"s3:PutObject",
"s3:PutObjectAcl"
],
"Resource": [
"arn:aws:s3:::<EVENTS_DLQ_BUCKET>/*"
]
}
]
}

Customer Requirements for S3 Sync Setup

To complete the S3 sync setup, securely provide the following information:

Required information

  • ARN of events export bucket: The Amazon Resource Name (ARN) of the S3 bucket that will be used to export events.
  • Region of events export bucket: The AWS region where the events export bucket is located.
  • ARN of dead-letter queue bucket: The ARN of the S3 bucket that will be used as the dead-letter queue for failed events.
  • AWS Account ID: The AWS account ID associated with your bucket(s).
  • Event shape: The shape of the events you will be sending.

Important notes

  • Our team will provide you with the IAM role that Orb will use to access the events export bucket and dead-letter-queue bucket.
  • Please ensure that the necessary permissions and policies are in place, as outlined in the S3 sync setup instructions, to allow Orb to access and read from the events export bucket and write to the dead-letter queue bucket.