15.2 C
Canberra
Thursday, April 16, 2026

Getting began with Apache Iceberg write assist in Amazon Redshift – Half 2


In Getting began with Apache Iceberg write assist in Amazon Redshift – half 1, you realized the best way to create Apache Iceberg tables and write knowledge immediately from Amazon Redshift to your knowledge lake. You arrange exterior schemas, created tables in each Amazon Easy Storage Service (Amazon S3) and S3 Tables, and carried out INSERT operations whereas sustaining ACID (Atomicity, Consistency, Isolation, Sturdiness) compliance.

Amazon Redshift now helps DELETE, UPDATE, and MERGE operations for Apache Iceberg tables saved in Amazon S3 and Amazon S3 desk buckets. With these operations, you possibly can modify knowledge on the row stage, implement upsert patterns, and handle the info lifecycle whereas sustaining transactional consistency utilizing acquainted SQL syntax. You may run complicated transformations in Amazon Redshift and write outcomes to Apache Iceberg tables that different analytics engines like Amazon EMR or Amazon Athena can instantly question.

On this put up, you’re employed with buyer and orders datasets that have been created and used within the beforehand talked about put up to show these capabilities in a knowledge synchronization state of affairs.

Answer overview

This resolution demonstrates DELETE, UPDATE, and MERGE operations for Apache Iceberg tables in Amazon Redshift utilizing a typical knowledge synchronization sample: sustaining buyer data and orders knowledge throughout staging and manufacturing tables. The workflow consists of three key operations:

  • DELETE – Take away buyer data based mostly on opt-out requests
  • UPDATE – Modify present buyer info
  • MERGE – Synchronize order knowledge between staging and manufacturing tables utilizing upsert patterns
Figure : solution overview

Determine 1: resolution overview

The answer makes use of a staging desk (orders_stg) saved in an S3 desk bucket for incoming knowledge and reference tables (customer_opt_out) in Amazon Redshift for managing knowledge lifecycle operations. With this structure, you possibly can course of adjustments effectively whereas sustaining ACID compliance throughout each storage varieties.

Stipulations

For this walkthrough, you need to have accomplished the setup steps from Getting began with Apache Iceberg write assist in Amazon Redshift – half 1, together with:

  • Create an Amazon Redshift knowledge warehouse (provisioned or Serverless)
  • Arrange the required IAM function (RedshifticebergRole) with applicable permissions
  • Create an Amazon S3 bucket and S3 Desk bucket
  • Configure AWS Glue Knowledge Catalog database and organising entry
  • Arrange AWS Lake Formation permissions
  • Create the buyer Apache Iceberg desk in Amazon S3 normal buckets with pattern buyer knowledge
  • Create the orders Apache Iceberg desk in Amazon S3 Desk buckets with pattern order knowledge
  • Amazon Redshift knowledge warehouse on p200 model or larger

Knowledge preparation

On this part, you arrange the pattern knowledge wanted to show MERGE, UPDATE, and DELETE operations. To organize your knowledge, full the next steps:

  1. Log in to Amazon Redshift utilizing Question Editor V2 with the Federated consumer possibility.
  2. Create the orders_stg and customer_opt_out tables with pattern knowledge:
CREATE TABLE "iceberg-write-blog@s3tablescatalog".iceberg_write_namespace.orders_stg
(
customer_id BIGINT,
order_id BIGINT,
Total_order_amt DECIMAL(10,2),
Total_order_tax_amt REAL,
tax_pct DOUBLE PRECISION,
order_date DATE,
order_created_at_tz TIMESTAMPTZ,
is_active_ind BOOLEAN
)
USING ICEBERG;
INSERT INTO "iceberg-write-blog@s3tablescatalog".iceberg_write_namespace.orders_stg
(order_date, order_id, customer_id, total_order_amt, total_order_tax_amt, tax_pct, order_created_at_tz, is_active_ind)
VALUES
('2024-11-11', 1016, 10, 167.45, 13.40, 0.08, '2024-11-11 06:55:00-06:00', true),
('2024-11-12', 1017, 15, 34.99, 2.80, 0.08, '2024-11-12 23:30:30-06:00', true),
('2024-11-09', 1014, 9, 500.60, 56.80, 0.09, '2024-11-09 16:20:55-06:00', true),
('2024-11-10', 1015, 5, 329.85, 33.51, 0.08, '2024-11-10 11:45:30-06:00', true);
choose * from "iceberg-write-blog@s3tablescatalog".iceberg_write_namespace.orders_stg;

Figure 2: orders_stg result set

Determine 2: orders_stg consequence set

CREATE TABLE dev.public.customer_opt_out
(
customer_id bigint,
customer_name varchar,
opt_out_ind char(1),
cust_rec_upd_ind char(1)
);
INSERT INTO dev.public.customer_opt_out VALUES
(9, 'Customer9 Martinez', 'Y', 'N'),
(12, 'Customer12 Thomas', 'Y', 'N'),
(13, 'Customer13 Albon', 'N', 'Y'),
(14, 'Customer14 Oscar', 'N', 'Y');
choose * from dev.public.customer_opt_out;

Figure 3: customer_opt_out result set

Determine 3: customer_opt_out consequence set

Now you can use the orders_stg and customer_opt_out tables to show knowledge manipulation operations on the orders and buyer tables created within the prerequisite part.

MERGE

MERGE conditionally inserts, updates, or deletes rows in a goal desk based mostly on the outcomes of a be a part of with a supply desk. You need to use MERGE to synchronize two tables by inserting, updating, or deleting rows in a single desk based mostly on variations discovered within the different desk.

To carry out a MERGE operation:

  1. Confirm that the present knowledge within the orders desk for order IDs 1014, 1015, 1016, and 1017.You loaded this pattern knowledge in Half 1:
choose * from "iceberg-write-blog@s3tablescatalog".iceberg_write_namespace.orders
the place order_id in (1014,1015,1016,1017);

Figure 4: orders data for existing orders for orders in orders_stg

Determine 4: orders knowledge for present orders for orders in orders_stg

The orders desk accommodates present rows for order IDs 1014 and 1015.

  1. Run the next MERGE operation utilizing order_id as the important thing column to match rows between the orders and orders_stg tables:
MERGE INTO "iceberg-write-blog@s3tablescatalog".iceberg_write_namespace.orders
USING "iceberg-write-blog@s3tablescatalog".iceberg_write_namespace.orders_stg
ON orders.order_id = orders_stg.order_id
WHEN MATCHED THEN UPDATE 
SET
customer_id         = orders_stg.customer_id,
total_order_amt     = orders_stg.total_order_amt,
total_order_tax_amt = orders_stg.total_order_tax_amt,
tax_pct             = orders_stg.tax_pct,
order_date          = orders_stg.order_date,
order_created_at_tz = orders_stg.order_created_at_tz,
is_active_ind       = orders_stg.is_active_ind
WHEN NOT MATCHED THEN INSERT
VALUES 
(orders_stg.customer_id,orders_stg.order_id,orders_stg.total_order_amt,orders_stg.total_order_tax_amt,orders_stg.tax_pct,orders_stg.order_date,orders_stg.order_created_at_tz,orders_stg.is_active_ind);

The operation updates present rows (1014 and 1015) and inserts new rows for order IDs that don’t exist within the orders desk (1016 and 1017).

  1. Confirm the up to date knowledge within the orders desk:
choose * from "iceberg-write-blog@s3tablescatalog".iceberg_write_namespace.orderswhere order_id in (1014,1015,1016,1017);

Figure 5: merged data on orders from orders_stg

Determine 5: merged knowledge on orders from orders_stg

The MERGE operation performs the next adjustments:

  • Updates present rows – Order IDs 1014 and 1015 have up to date total_order_amt and total_order_tax_amt values from the orders_stg desk
  • Inserts new rows – Order IDs 1016 and 1017 are inserted as a result of they don’t exist within the orders desk

This demonstrates the upsert sample, the place MERGE conditionally updates or inserts rows based mostly on the matching key column.

UPDATE

UPDATE modifies present rows in a desk based mostly on specified circumstances or values from one other desk.

Replace the buyer Apache Iceberg desk utilizing knowledge from the customer_opt_out Amazon Redshift native desk. The UPDATE operation makes use of the cust_rec_upd_ind column as a filter, updating solely rows the place the worth is ‘Y’.

To carry out an UPDATE operation:

  1. Confirm the present customer_name values for buyer IDs 13 and 14 in customer_opt_out and buyer (loaded this pattern knowledge in Half 1) tables:
choose * from dev.public.customer_opt_out
the place cust_rec_upd_ind = 'Y';

Figure 6: verify existing customer data for customers from customer_opt_out

Determine 6: confirm present buyer knowledge for purchasers from customer_opt_out

choose customer_id,customer_name from dev.demo_iceberg.buyer
the place customer_id in(13,14);

Figure 7: verify existing customer name for customers from customer_opt_out

Determine 7: confirm present buyer title for purchasers from customer_opt_out

  1. Run the next UPDATE operation to change buyer names based mostly on the cust_rec_upd_ind from customer_opt_out:
UPDATE dev.demo_iceberg.customerSET customer_name = customer_opt_out.customer_name
FROM dev.public.customer_opt_out
WHERE customer_opt_out.cust_rec_upd_ind = 'Y'and buyer.customer_id = customer_opt_out.customer_id;

  1. Confirm the adjustments for buyer IDs 13 and 14:
choose customer_id,customer_name from dev.demo_iceberg.buyer the place customer_id in(13,14) order by 1;

Figure 8: updated customer names in customer table

Determine 8: up to date buyer names in buyer desk

The UPDATE operation modifies the customer_name values based mostly on the be a part of situation with the customer_opt_out desk. Buyer IDs 13 and 14 now have up to date names (Customer13 Albon and Customer14 Oscar).

DELETE

DELETE removes rows from a desk based mostly on specified circumstances. With no WHERE clause, DELETE removes all of the rows from desk.

Delete rows from the buyer Apache Iceberg desk utilizing knowledge from the customer_opt_out Amazon Redshift native desk. The DELETE operation makes use of the opt_out_ind column as a filter, eradicating solely rows the place the worth is ‘Y’.

To carry out a DELETE operation:

  1. Confirm the opt-out indicator knowledge within the customer_opt_out desk:
choose * from dev.public.customer_opt_out
the place opt_out_ind = 'Y';

Figure 9: verify customer records for opt out

Determine 9: confirm buyer data for decide out

  1. Confirm the present buyer knowledge for buyer IDs 9 and 12:
choose * from dev.demo_iceberg.customerwhere customer_id in(9,12);

Figure 0: verify existing customers data in customer table for opt out

Determine 10: confirm present clients knowledge in buyer desk for decide out

  1. Evaluation the question execution plan:
EXPLAINDELETE FROM demo_iceberg.customerUSING public.customer_opt_out
WHERE buyer.customer_id = customer_opt_out.customer_id
AND customer_opt_out.opt_out_ind = 'Y';

Figure 1: query plan for the DELETE queryThe execution plan shows Amazon S3 scans for Apache Iceberg format tables, indicating that Amazon Redshift removes rows directly from the Amazon S3 bucket.

Determine 11: question plan for the DELETE question. The execution plan exhibits Amazon S3 scans for Apache Iceberg format tables, indicating that Amazon Redshift removes rows immediately from the Amazon S3 bucket.

  1. Run the next DELETE operation:
DELETE FROM demo_iceberg.buyer
USING public.customer_opt_out
WHERE buyer.customer_id = customer_opt_out.customer_id
AND customer_opt_out.opt_out_ind = 'Y';

  1. Confirm that the rows have been eliminated:
choose * from dev.demo_iceberg.buyer the place customer_id in(9,12);

Figure 2: result set from customer table for opt out customer after delete

Determine 12: consequence set from buyer desk for decide out buyer after delete

The question returns no rows, confirming that buyer IDs 9 and 12 have been efficiently deleted from the buyer desk.

Finest practices

After performing a number of UPDATE or DELETE operations, think about operating desk upkeep to optimize learn efficiency:

  • For AWS Glue tables – Use AWS Glue desk optimizers. For extra info, see Desk optimizers within the AWS Glue Developer Information.
  • For S3 Tables – Use S3 Tables upkeep operations. For extra info, see S3 Tables upkeep within the Amazon S3 Consumer Information.

Desk upkeep merges and compacts deletion information generated by Merge-on-Learn operations, bettering question efficiency for subsequent reads.

Conclusion

You need to use Amazon Redshift assist for DELETE, UPDATE, and MERGE operations on Apache Iceberg tables to construct knowledge architectures that mix warehouse efficiency with knowledge lake scalability. You may modify knowledge on the row stage whereas sustaining ACID compliance, giving you an identical flexibility with Apache Iceberg tables as you’ve got with native Amazon Redshift tables.

Get began:


In regards to the authors

Sanket Hase

Sanket Hase

Sanket is an Engineering Supervisor with the Amazon Redshift group, main question execution groups within the areas of knowledge lake analytics, hardware-software co-design, and vectorized question execution.

Raghu Kuppala

Raghu Kuppala

Raghu is an Analytics Specialist Options Architect skilled working within the databases, knowledge warehousing, and analytics house. Exterior of labor, he enjoys attempting totally different cuisines and spending time along with his household and associates.

Ritesh Sinha

Ritesh is an Analytics Specialist Options Architect based mostly out of San Francisco. He has helped clients construct scalable knowledge warehousing and massive knowledge options for over 16 years. He likes to design and construct environment friendly end-to-end options on AWS. In his spare time, he loves studying, strolling, and doing yoga.

Sundeep Kumar

Sundeep Kumar

Sundeep is a Sr. Specialist Options Architect at Amazon Net Companies (AWS), serving to clients construct knowledge lake and analytics platforms and options. When not constructing and designing knowledge lakes, Sundeep enjoys listening to music and enjoying guitar.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

[td_block_social_counter facebook="tagdiv" twitter="tagdivofficial" youtube="tagdiv" style="style8 td-social-boxed td-social-font-icons" tdc_css="eyJhbGwiOnsibWFyZ2luLWJvdHRvbSI6IjM4IiwiZGlzcGxheSI6IiJ9LCJwb3J0cmFpdCI6eyJtYXJnaW4tYm90dG9tIjoiMzAiLCJkaXNwbGF5IjoiIn0sInBvcnRyYWl0X21heF93aWR0aCI6MTAxOCwicG9ydHJhaXRfbWluX3dpZHRoIjo3Njh9" custom_title="Stay Connected" block_template_id="td_block_template_8" f_header_font_family="712" f_header_font_transform="uppercase" f_header_font_weight="500" f_header_font_size="17" border_color="#dd3333"]
- Advertisement -spot_img

Latest Articles