6 C
Canberra
Saturday, May 9, 2026

The best way to consolidate cross-Area S3 information into OpenSearch


You might need information in Amazon Easy Storage Service (Amazon S3) buckets in numerous AWS Areas that you really want out there in a single Amazon OpenSearch Service area or assortment. Consolidating information throughout Areas supplies unified analytics and searches, cut back operation complexity, and streamline your search infrastructure. We’re joyful to announce that Amazon OpenSearch Ingestion pipelines can now learn from S3 buckets in numerous Areas to ingest and consolidate information right into a single OpenSearch Service area or assortment.

To consolidate this information throughout AWS Areas, you beforehand had to supply your individual answer. Now Amazon OpenSearch Ingestion will help you accomplish this. On this put up, I’ll present you the right way to use the brand new cross-Area help to ingest information from S3 buckets throughout a number of AWS Areas right into a single OpenSearch Service area or assortment.

Amazon OpenSearch Ingestion (OSI) is a feature-rich information ingestion pipeline that you should utilize for a lot of completely different functions: observability, analytics, and zero-ETL search. Many shoppers use OpenSearch Ingestion to ingest information from Amazon S3 into OpenSearch Service domains and Amazon OpenSearch Serverless collections. Till now, you could possibly solely ingest from a single AWS Area at a time. Now that you should utilize OpenSearch Ingestion for cross-Area S3 ingestion, I’ll present you the way you should utilize it in two eventualities: batch processing utilizing S3 scan, and streaming ingestion utilizing Amazon Easy Queue Service (Amazon SQS) queues for AWS vended logs like Amazon Digital Non-public Cloud (Amazon VPC) Circulate Logs and AWS CloudTrail.

Conditions

Full the next prerequisite steps:

  1. Deploy an OpenSearch Service area or OpenSearch Serverless assortment within the Areas the place you need to carry out your search or analytics.
  2. You want S3 buckets in no less than two completely different Areas. You should use present ones or create S3 buckets. You should use one in the identical AWS Area as your OpenSearch Service area or assortment, or use two utterly completely different Areas.
  3. Add objects with information into your S3 buckets. The information could be JSON, ND-JSON, Parquet, CSV, or plaintext codecs.
  4. Configure AWS Identification and Entry Administration (IAM) permissions wanted for OSI. For directions, see Amazon S3 as a supply.
  5. For cross-Area ingestion, you will need to now additionally embrace the s3:GetBucketLocation permission. This provides the pipeline the power to find out which AWS Area the bucket is positioned in.

After you full these steps, you may both arrange your Amazon OpenSearch Ingestion pipelines for batch or streaming eventualities. Within the following sections, I’ll provide you with suggestions on when to decide on which method, and I define the steps for creating your pipeline.

Batch eventualities

You should use the OpenSearch Ingestion S3 scan functionality to learn batch information from S3. You would possibly discover this method helpful when your information is written to S3 on a schedule. To carry out a cross-Area S3 scan, you solely specify the buckets that you simply’re studying from if you create the OpenSearch Ingestion pipeline.

The next diagram reveals the design for an OpenSearch Ingestion pipeline in us-west-2 studying from S3 buckets in us-east-1 and eu-west-1 and writing that information into an OpenSearch Service area in us-west-2.

Subsequent, you’ll create an OpenSearch Ingestion pipeline. You need to create this pipeline in the identical Area as your OpenSearch Service area or assortment.

model: "2"
s3-scan-cross-region:
  supply:
    s3:
      compression: automated
      codec:
        json:
      scan:
        buckets:
          - bucket:
              title: amzn-s3-demo-bucket1
          - bucket:
              title: amzn-s3-demo-bucket2
      aws:
        area: us-west-2

  sink:
    - opensearch:
        hosts: [ "https://search-mydomain-abcdefghijklmn.us-west-2.es.amazonaws.com" ]
        index: s3_scan_cross_region
        aws:
          area: us-west-2

The earlier pipeline configuration helps the JSON codec. You would possibly need to configure a special codec in case your information isn’t a big JSON object.

Now you can question your OpenSearch Service area or assortment to see the information that you simply ingested.

Streaming eventualities: AWS vended logs

Like lots of our clients, you would possibly need to ingest S3 information from completely different AWS Areas into OpenSearch Service. A standard motive is to consolidate AWS vended logs. For instance, VPC Circulate Logs, CloudTrail information, and cargo balancer logs. For these eventualities, you may configure OpenSearch Ingestion pipelines to learn from an Amazon SQS queue to stream information into your OpenSearch Service area or assortment.

These AWS vended logs write to Amazon S3 in the identical AWS Area because the service working it. For instance, VPC Circulate Logs can be in the identical AWS Area as your Amazon VPC. You should use OpenSearch Ingestion to consolidate these logs into one AWS Area. Within the VPC Circulate Logs instance, you may consolidate your VPC Circulate Logs from a number of AWS Areas right into a single OpenSearch Service area or assortment to investigate community patterns out of your completely different Amazon VPCs.

The next diagram outlines the general setup. It reveals an instance of sending AWS vended logs from us-east-1 and eu-west-1 to an OpenSearch Service area in us-west-2. You’ll be able to change the AWS Areas relying in your particular wants.

  1. You need to configure your vended logs to write down log occasions to Amazon S3 buckets of their respective AWS Areas. Utilizing VPC Circulate Logs as our instance, you may configure VPC Circulate Logs to your VPCs.
  2. Create an Amazon SQS queue in the identical AWS Area as your OpenSearch Service area.
  3. Amazon S3 doesn’t ship notifications to cross-Area Amazon SQS queues, so you’ll use intermediate Amazon Easy Notification Service (Amazon SNS) subjects to consolidate the notifications from a number of Areas into one queue. For every S3 bucket, create an SNS matter.
  4. Configure S3 Occasion Notifications for SNS. You’ll do that for every S3 bucket and every SNS matter.
  5. SNS can ship cross-Area notifications to SQS. Create a subscription from every SNS matter that you simply created in step 3 to the one SQS queue you created in step 2.
  6. Configure your pipeline function to learn from SQS and browse from the related S3 buckets.

Now create an OpenSearch Ingestion pipeline in the identical AWS Area as your OpenSearch Service area.

model: "2"
s3-sqs-cross-region:
  supply:
    s3:
      notification_type: sqs
      codec:
        newline:
      sqs:
        queue_url: https://sqs.us-west-2.amazonaws.com/123456789012/amzn-s3-demo-all-regions
      aws:
        area: us-west-2

  sink:
    - opensearch:
        hosts: [ "https://search-mydomain-abcdefghijklmn.us-west-2.es.amazonaws.com" ]
        index: s3_sqs_cross_region
        aws:
          area: us-west-2

The earlier pipeline configuration helps the JSON codec. You would possibly need to configure a special codec in case your information isn’t a big JSON object.

Subsequent, add objects with information into your S3 buckets. By importing information, S3 will ship notifications to SNS after which the SQS queue.

Now you can question your OpenSearch Service area or assortment to see the information that you simply ingested.

Here’s what makes this attainable and what’s completely different. The SQS queue receives the occasion notifications for the buckets. Earlier than the cross-Area function of OpenSearch Ingestion, the pipeline might see these occasions, however couldn’t entry the S3 bucket even when the permissions had been granted. Now, the pipeline will decide the AWS Area that the bucket is in, entry an AWS Safety Token Service (AWS STS) token for the AWS Area of the bucket. Utilizing the STS token from the identical Area because the S3 bucket permits the pipeline to learn and entry the information.

Utilizing the AWS Console

Once you create the pipeline utilizing the OpenSearch Ingestion console, you should have choices to pick out a blueprint to your use-case. These blueprints assist you create pipelines for varied vended log sorts solely by deciding on your SQS queue and OpenSearch area. The blueprint handles the information kind mappings for you by together with applicable processors. You should use these blueprints as a place to begin and modify your processors to your particular necessities.

Clear up assets

Once you’re accomplished testing this out, use the next assets to delete the assets that you simply created.

For those who arrange a batch pipeline:

  • Delete the OpenSearch Ingestion pipeline.

For those who arrange a streaming pipeline:

For each pipelines, these steps assist you delete the widespread assets.

Conclusion

On this put up, I confirmed you the way you should utilize Amazon OpenSearch Ingestion to ingest information from Amazon S3 buckets in numerous AWS Areas. I confirmed that this works for each batch scan and streaming eventualities. The function presents you a simple technique to consolidate your information from different Areas into one OpenSearch Service area or assortment.

To get began with the cross-Area S3 supply, consult with the OpenSearch Ingestion documentation or strive making a pipeline from considered one of our blueprints utilizing the OpenSearch Ingestion console. You’ll be able to learn in regards to the codecs that OpenSearch Ingestion presents for parsing your S3 objects. You too can learn the way in regards to the varied processors that OpenSearch Ingestion presents, so you may rework and enrich your information to satisfy your wants.

You too can use OpenSearch Ingestion for cross-Area and cross-account. To do that, you will need to grant cross-account permissions in your S3 bucket. You need to additionally make some modifications to your pipeline configuration. Combining what I confirmed you on this put up with the present cross-account options tremendously expands your ingestion choices.

For those who’re able to take your streaming ingestion analytics to the subsequent stage you may examine the right way to generate metrics from logs and even the right way to ship these derived metrics to Amazon Managed Service for Prometheus.

Have you ever tried out the cross-Area capabilities of OpenSearch Ingestion? Share your use-cases and questions within the feedback.


In regards to the authors

David is a senior software program engineer engaged on observability in OpenSearch at Amazon Internet Companies. He’s a maintainer on the Information Prepper challenge.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

[td_block_social_counter facebook="tagdiv" twitter="tagdivofficial" youtube="tagdiv" style="style8 td-social-boxed td-social-font-icons" tdc_css="eyJhbGwiOnsibWFyZ2luLWJvdHRvbSI6IjM4IiwiZGlzcGxheSI6IiJ9LCJwb3J0cmFpdCI6eyJtYXJnaW4tYm90dG9tIjoiMzAiLCJkaXNwbGF5IjoiIn0sInBvcnRyYWl0X21heF93aWR0aCI6MTAxOCwicG9ydHJhaXRfbWluX3dpZHRoIjo3Njh9" custom_title="Stay Connected" block_template_id="td_block_template_8" f_header_font_family="712" f_header_font_transform="uppercase" f_header_font_weight="500" f_header_font_size="17" border_color="#dd3333"]
- Advertisement -spot_img

Latest Articles