5.2 C
Canberra
Friday, July 3, 2026

Accelerating log analytics at scale with AWS Glue and Apache Iceberg materialized views


Managing high-volume utility logs at scale presents challenges from sluggish question efficiency and issue working complicated aggregations to sustaining real-time analytics on streaming knowledge. Apache Iceberg materialized views with AWS Glue, Amazon Information Firehose, and AWS Lambda deal with these challenges by accelerating log analytics via pre-computed question outcomes.

On this submit, you learn to construct an utility log pipeline for manufacturing use with Amazon CloudWatch Logs, AWS Lambda, Amazon Information Firehose, AWS Glue, and Apache Iceberg materialized tables. You then use materialized views to speed up question efficiency. This resolution helps you obtain quicker question response occasions on large-scale log knowledge with out requiring you to handle steady knowledge lake refresh.

Resolution overview

This resolution accelerates log analytics by pre-computing question outcomes via Apache Iceberg materialized views. By querying pre-aggregated outcomes as an alternative of scanning uncooked log knowledge for each request, you may assist cut back question response occasions. For instance, queries that beforehand took minutes scanning terabytes of uncooked knowledge might return in seconds from the compact materialized view. Outcomes replace routinely as new logs arrive, serving to you deal with high-volume log streams whereas sustaining quick analytics efficiency.

Structure overview

The structure consists of AWS providers working collectively to create a knowledge pipeline:

  • Amazon CloudWatch Logs receives utility logs and system occasions, then routes them to downstream targets utilizing CloudWatch Logs subscription filters. CloudWatch Logs has a built-in retry mechanism. If the vacation spot service returns a retryable error, CloudWatch Logs routinely retries supply for as much as 24 hours.
  • AWS Lambda serves because the transformation layer, parsing log messages, enriching knowledge, and making ready information for storage.
  • Amazon Information Firehose buffers incoming knowledge and handles the technical necessities of writing to Apache Iceberg tables (an open-source knowledge desk format), together with batch optimization, schema validation, and automated retry logic for failed writes.
  • Apache Iceberg tables saved in Amazon Easy Storage Service (Amazon S3) present ACID transaction help, schema evolution capabilities, and environment friendly question efficiency. Materialized views are managed tables within the AWS Glue Information Catalog that retailer precomputed question ends in Apache Iceberg format.
  • AWS Glue runs a one-time job throughout stack creation to provision the Iceberg database, base desk, and materialized view construction within the Information Catalog. A second scheduled Glue job refreshes the materialized view by recomputing aggregations from the bottom desk on a configurable interval serving to downstream queries via Amazon Athena return up-to-date, pre-aggregated outcomes with out scanning uncooked knowledge.

This structure is designed to help automated scaling, serverless infrastructure, error dealing with that routes failed information to Amazon S3 for evaluation and replay, seize of failed Lambda invocations for automated retry, and real-time monitoring via Amazon CloudWatch metrics.

Conditions

Earlier than you deploy the answer, overview the next stipulations.

  • AWS account with essential permissions to execute an AWS CloudFormation template, run AWS Glue jobs, run queries to confirm Iceberg desk knowledge utilizing Amazon Athena.
  • Primary familiarity with Boto3 to grasp Python code. Foundational understanding of Apache Iceberg ideas.

Resolution deployment

The next deployment steps information you thru implementing this resolution in your AWS account.

Step 1: Deploy the AWS CloudFormation pipeline stack

You possibly can deploy this resolution utilizing an AWS CloudFormation stack. The template handles creating Amazon S3 buckets, importing AWS Glue and Lambda scripts, provisioning IAM roles, configuring the Firehose supply stream, and working the Glue job to create the Iceberg database, base desk, and materialized view.

Launch the stack within the AWS CloudFormation console. Assessment the parameters marked REQUIRED and alter the toggle choices (CreateScriptBucket, EnableLakeFormation, CreateSubscriptionLogGroup) primarily based in your surroundings. Different parameters embrace preconfigured defaults that you must overview in your surroundings. Select the CloudFormation stack to deploy sources utilizing the AWS CloudFormation console.

Pipeline stack required parameters view within the AWS CloudFormation console.

Extra pipeline stack required parameters within the AWS CloudFormation console.

Step 2: Take a look at the end-to-end pipeline

Ship pattern log occasions matching the Iceberg desk schema (for instance, id, customer_name, quantity, and order_date) to the CloudWatch log group. The subscription filter triggers the Lambda, which forwards information to Firehose for supply into the Iceberg desk.

git clone https://github.com/aws-samples/sample-log-analytics-iceberg-mv.git
cd sample-log-analytics-iceberg-mv
python3 scripts/send_test_logs.py

Terminal output showing the test log event script sending sample records to the CloudWatch log group

Execution of check occasions.

Confirm knowledge supply and refresh the materialized view

Permit roughly 30 seconds (study extra in Buffer knowledge for dynamic partitioning) for the Firehose buffer to flush. After the buffer flushes, run the next question in Amazon Athena to confirm that knowledge has been efficiently delivered to the bottom desk.

Question consequence utilizing Amazon Athena.

Automated materialized view refresh

On this instance, the AWS CloudFormation stack provisions a Glue job configured to run the materialized view (MV) refresh as soon as day by day at midnight UTC, which means the MV displays knowledge as much as yesterday. You possibly can alter the set off’s cron schedule to match frequent MV refresh necessities comparable to hourly, each quarter-hour, or on demand.

The Glue job performs a full recomputation of the aggregations from the bottom Iceberg desk and writes the outcomes to the MV. Downstream shoppers querying via Athena learn from this pre-aggregated view, delivering quicker efficiency. That is particularly essential in actual manufacturing situations the place the bottom desk incorporates thousands and thousands of information and quite a few columns. Computing aggregations immediately from uncooked knowledge at question time would degrade downstream utility efficiency.

Job scheduled view within the AWS Glue console.

In a manufacturing surroundings, the bottom Iceberg desk shops each particular person order occasion, probably thousands and thousands of rows with dozens of columns rising day by day. When dashboards or downstream purposes want aggregated insights like day by day income per buyer or month-to-month order counts by area, querying the bottom desk immediately forces Athena to scan terabytes of uncooked knowledge on each request. This ends in sluggish response occasions and excessive prices at scale. The materialized view solves this by pre-computing these business-level aggregations as soon as through the scheduled refresh, storing the ends in a compact, purpose-built desk with far fewer rows and columns. This implies a dashboard question that might scan thousands and thousands of uncooked information now reads from a pre-aggregated desk, designed to scale back question response time. The bottom desk stays your supply of reality for granular, row-level lookups, whereas the materialized view serves because the efficiency layer for repeated analytical queries with embedded enterprise logic.

Materialized View question consequence utilizing Amazon Athena

Various: Amazon S3 Tables

This resolution can be applied utilizing Amazon S3 Tables, which supplies a completely managed Apache Iceberg expertise with native help for materialized views. On this submit, we use the Glue-based method to display the underlying mechanics and supply full flexibility to customise refresh logic in your particular necessities. To study extra, see Getting began with S3 Tables.

Clear up

To keep away from incurring future prices, delete the sources you created as a part of this train in case you are not planning to make use of them additional. Delete the stacks created within the earlier steps, then empty and delete the Amazon S3 buckets.

Conclusion

This resolution reveals find out how to construct a scalable utility log knowledge pipeline that delivers log occasions from Amazon CloudWatch Logs to Apache Iceberg tables utilizing AWS Lambda and Amazon Information Firehose. This structure makes use of absolutely managed AWS providers to attenuate operational overhead whereas offering excessive availability and constant efficiency.

Key strengths embrace serverless infrastructure designed to help automated scaling, error dealing with designed to route failed information to Amazon S3 for troubleshooting and replay, and analytics capabilities via Apache Iceberg’s ACID transactions and question efficiency optimizations. As you progress this resolution into manufacturing, we suggest that you just implement knowledge high quality checks in Lambda and configure encryption at relaxation and in transit in your knowledge. You too can set up knowledge retention insurance policies and discover partitioning methods for higher question efficiency.

You now have a log analytics pipeline constructed for manufacturing use that scales together with your workload.

Extra sources


In regards to the creator

Shinu Tharol

Shinu Tharol

Shinu is a Technical Account Supervisor at AWS, delivering technical steerage and strategic help to enterprise prospects. His experience consists of cloud operations, synthetic intelligence, knowledge analytics, and cloud value optimization, enabling prospects to maximise their AWS investments whereas sustaining operational excellence.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

[td_block_social_counter facebook="tagdiv" twitter="tagdivofficial" youtube="tagdiv" style="style8 td-social-boxed td-social-font-icons" tdc_css="eyJhbGwiOnsibWFyZ2luLWJvdHRvbSI6IjM4IiwiZGlzcGxheSI6IiJ9LCJwb3J0cmFpdCI6eyJtYXJnaW4tYm90dG9tIjoiMzAiLCJkaXNwbGF5IjoiIn0sInBvcnRyYWl0X21heF93aWR0aCI6MTAxOCwicG9ydHJhaXRfbWluX3dpZHRoIjo3Njh9" custom_title="Stay Connected" block_template_id="td_block_template_8" f_header_font_family="712" f_header_font_transform="uppercase" f_header_font_weight="500" f_header_font_size="17" border_color="#dd3333"]
- Advertisement -spot_img

Latest Articles