21.7 C
Canberra
Tuesday, October 21, 2025

Greatest practices for migrating from Apache Airflow 2.x to Apache Airflow 3.x on Amazon MWAA


Apache Airflow 3.x on Amazon MWAA introduces architectural enhancements corresponding to API-based activity execution that gives enhanced safety and isolation. Different main updates embody a redesigned UI for higher person expertise, scheduler-based backfills for improved efficiency, and help for Python 3.12. Not like in-place minor Airflow model upgrades in Amazon MWAA, upgrading to Airflow 3 from Airflow 2 requires cautious planning and execution by way of a migration strategy attributable to elementary breaking adjustments.

This migration presents a chance to embrace next-generation workflow orchestration capabilities whereas offering enterprise continuity. Nevertheless, it’s greater than a easy improve. Organizations migrating to Airflow 3.x on Amazon MWAA should perceive key breaking adjustments, together with the elimination of direct metadata database entry from employees, deprecation of SubDAGs, adjustments to default scheduling conduct, and library dependency updates. This publish supplies greatest practices and a streamlined strategy to efficiently navigate this crucial migration, offering minimal disruption to your mission-critical information pipelines whereas maximizing the improved capabilities of Airflow 3.

Understanding the migration course of

The journey from Airflow 2.x to three.x on Amazon MWAA introduces a number of elementary adjustments that organizations should perceive earlier than starting their migration. These adjustments have an effect on core workflow operations and require cautious planning to realize a easy transition.

You need to be conscious of the next breaking adjustments:

  • Removing of direct database entry – A crucial change in Airflow 3 is the elimination of direct metadata database entry from employee nodes. Duties and customized operators should now talk by way of the REST API as an alternative of direct database connections. This architectural change impacts code that beforehand accessed the metadata database instantly by way of SQLAlchemy connections, requiring refactoring of current DAGs and customized operators.
  • SubDAG deprecation – Airflow 3 removes the SubDAG assemble in favor of TaskGroups, Belongings, and Knowledge Conscious Scheduling. Organizations should refactor current SubDAGs to one of many beforehand talked about constructs.
  • Scheduling conduct adjustments – Two notable adjustments to default scheduling choices require an influence evaluation:
    • The default values for catchup_by_default and create_cron_data_intervals modified to False. This transformation impacts DAGs that don’t explicitly set these choices.
    • Airflow 3 removes a number of context variables, corresponding to execution_date, tomorrow_ds, yesterday_ds, prev_ds, and next_ds. You could exchange these variables with at present supported context variables.
  • Library and dependency adjustments – A major variety of libraries change in Airflow 3.x, requiring DAG code refactoring. Many beforehand included supplier packages may want express addition to the necessities.txt file.
  • REST API adjustments – The REST API path adjustments from /api/v1 to /api/v2, affecting exterior integrations. For extra details about utilizing the Airflow REST API, see Creating an internet server session token and calling the Apache Airflow REST API.
  • Authentication system – Though Airflow 3.0.1 and later variations default to SimpleAuthManager as an alternative of Flask-AppBuilder, Amazon MWAA will proceed utilizing Flask-AppBuilder for Airflow 3.x. This implies prospects on Amazon MWAA is not going to see any authentication adjustments.

The migration requires creating a brand new atmosphere relatively than performing an in-place improve. Though this strategy calls for extra planning and assets, it supplies the benefit of sustaining your current atmosphere as a fallback possibility in the course of the transition, facilitating enterprise continuity all through the migration course of.

Pre-migration planning and evaluation

Profitable migration is determined by thorough planning and evaluation of your present atmosphere. This part establishes the muse for a easy transition by figuring out dependencies, configurations, and potential compatibility points. Consider your atmosphere and code in opposition to the beforehand talked about breaking adjustments to have a profitable migration.

Setting evaluation

Start by conducting a whole stock of your present Amazon MWAA atmosphere. Doc all DAGs, customized operators, plugins, and dependencies, together with their particular variations and configurations. Make sure that your present atmosphere is on model 2.10.x, as a result of this supplies one of the best compatibility path for upgrading to Amazon MWAA with Airflow 3.x.

Determine the construction of the Amazon Easy Storage Service (Amazon S3) bucket containing your DAG code, necessities file, startup script, and plugins. You’ll replicate this construction in a brand new bucket for the brand new atmosphere. Creating separate buckets for every atmosphere avoids conflicts and permits continued improvement with out affecting present pipelines.

Configuration documentation

Doc all customized Amazon MWAA atmosphere variables, Airflow connections, and atmosphere configurations. Overview AWS Id and Entry Administration (IAM) assets, as a result of your new atmosphere’s execution position will want equivalent insurance policies. IAM customers or roles accessing the Airflow UI require the CreateWebLoginToken permission for the brand new atmosphere.

Pipeline dependencies

Understanding pipeline dependencies is crucial for a profitable phased migration. Determine interdependencies by way of Datasets (now Belongings), SubDAGs, TriggerDagRun operators, or exterior API interactions. Develop your migration plan round these dependencies so associated DAGs can migrate on the identical time.

Contemplate DAG scheduling frequency when planning migration waves. DAGs with longer intervals between runs present bigger migration home windows and decrease threat of duplicate execution in contrast with continuously operating DAGs.

Testing technique

Create your testing technique by defining a scientific strategy to figuring out compatibility points. Use the ruff linter with the AIR30 ruleset to routinely establish code requiring updates:

ruff test --preview --select AIR30 

Then, overview and replace your atmosphere’s necessities.txt file to ensure bundle variations adjust to the up to date constraints file. Moreover, generally used Operators beforehand included within the airflow-core bundle now reside in a separate bundle and have to be added to your necessities file.

Take a look at your DAGs utilizing the Amazon MWAA Docker photos for Airflow 3.x. These photos make it doable to create and check your necessities file, and ensure the Scheduler efficiently parses your DAGs.

Migration technique and greatest practices

A methodical migration strategy minimizes threat whereas offering clear validation checkpoints. The really helpful technique employs a phased blue/inexperienced deployment mannequin that gives dependable migrations and rapid rollback capabilities.

Phased migration strategy

The next migration phases can help you in defining your migration plan:

  • Part 1: Discovery, evaluation, and planning – On this part, full your atmosphere stock, dependency mapping, and breaking change evaluation. With the gathered info, develop the detailed migration plan. This plan will embody steps for updating code, updating your necessities file, making a check atmosphere, testing, creating the blue/inexperienced atmosphere (mentioned later on this publish), and the migration steps. Planning should additionally embody the coaching, monitoring technique, rollback situations, and the rollback plan.
  • Part 2: Pilot migration – The pilot migration part serves to validate your detailed migration plan in a managed atmosphere with a small vary of influence. Focus the pilot on two or three non-critical DAGs with various traits, corresponding to totally different schedules and dependencies. Migrate the chosen DAGs utilizing the migration plan outlined within the earlier part. Use this part to validate your plan and monitoring instruments, and modify each based mostly on precise outcomes. Throughout the pilot, set up baseline migration metrics to assist predict the efficiency of the complete migration.
  • Part 3: Wave-based manufacturing migration – After a profitable pilot, you’re prepared to start the complete wave-based migration for the remaining DAGs. Group remaining DAGs into logical waves based mostly on enterprise criticality (least crucial first), technical complexity, interdependencies (migrate dependent DAGs collectively), and scheduling frequency (much less frequent DAGs present bigger migration home windows). After you outline the waves, work with stakeholders to develop the wave schedule. Embody ample validation durations between waves to substantiate the wave is profitable earlier than beginning the following wave. This time additionally reduces the vary of influence within the occasion of a migration challenge, and supplies ample time to carry out a rollback.
  • Part 4: Publish-migration overview and decommissioning – In spite of everything waves are full, conduct a post-migration overview to establish classes realized, optimization alternatives, and some other unresolved objects. That is additionally a superb time to offer an approval on system stability. The ultimate step is decommissioning the unique Airflow 2.x atmosphere. After stability is decided, based mostly on enterprise necessities and enter, decommission the unique (blue) atmosphere.

Blue/inexperienced deployment technique

Implement a blue/inexperienced deployment technique for protected, reversible migration. With this technique, you should have two Amazon MWAA environments working in the course of the migration and handle which DAGs function through which atmosphere.

The blue atmosphere (present Airflow 2.x) maintains manufacturing workloads throughout transition. You’ll be able to implement a freeze window for DAG adjustments earlier than migration to keep away from last-minute code conflicts. This atmosphere serves because the rapid rollback atmosphere if a problem is recognized within the new (inexperienced) atmosphere.

The inexperienced atmosphere (new Airflow 3.x) receives migrated DAGs in managed waves. It mirrors the networking, IAM roles, and safety configurations from the blue atmosphere. Configure this atmosphere with the identical choices because the blue atmosphere, and create equivalent monitoring mechanisms so each environments could be monitored concurrently. To keep away from duplicate DAG runs, be certain a DAG solely runs in a single atmosphere. This includes pausing the DAG within the blue atmosphere earlier than activating the DAG within the inexperienced atmosphere.Keep the blue atmosphere in heat standby mode throughout your entire migration. Doc particular rollback steps for every migration wave, and check your rollback process for a minimum of one non-critical DAG. Moreover, outline clear standards for triggering the rollback (corresponding to particular failure charges or SLA violations).

Step-by-step migration course of

This part supplies detailed steps for conducting the migration.

Pre-migration evaluation and preparation

Earlier than initiating the migration course of, conduct a radical evaluation of your present atmosphere and develop the migration plan:

  • Make sure that your present Amazon MWAA atmosphere is on model 2.10.x
  • Create an in depth stock of your DAGs, customized operators, and plugins together with their dependencies and variations
  • Overview your present necessities.txt file to grasp bundle necessities
  • Doc all atmosphere variables, connections, and configuration settings
  • Overview the Apache Airflow 3.x launch notes to grasp breaking adjustments
  • Decide your migration success standards, rollback situations, and rollback plan
  • Determine a small variety of DAGs appropriate for the pilot migration
  • Develop a plan to coach, or familiarize, Amazon MWAA customers on Airflow 3

Compatibility checks

Figuring out compatibility points is crucial to a profitable migration. This step helps builders concentrate on particular code that’s incompatible with Airflow 3.

Use the ruff linter with the AIR30 ruleset to routinely establish code requiring updates:

ruff test --preview --select AIR30 

Moreover, overview your code for cases of direct metadatabase entry.

DAG code updates

Primarily based in your findings throughout compatibility testing, replace the affected DAG code for Airflow 3.x. The ruff DAG test utility can routinely repair widespread adjustments. Use the next command to run the utility in replace mode:

ruff test dag/ --select AIR301 --fix –preview

Widespread adjustments embody:

  • Substitute direct metadata database entry with API calls:
    # Earlier than (Airflow 2.x) - Direct DB entry
    from airflow.settings import Session
    from airflow.fashions.taskInstance import TaskInstance
    session=Session()
    outcome=session.question(TaskInstance)
    
    For Apache Airflow v3.x, make the most of  within the Amazon MWAA SDK.
    Replace core assemble imports with the brand new Airflow SDK namespace:
    # Earlier than (Airflow 2.x)
    from airflow.decorators import dag, activity
    
    # After (Airflow 3.x)
    from airflow.sdk import dag, activity

  • Substitute deprecated context variables with their trendy equivalents:
    # Earlier than (Airflow 2.x)
    def my_task(execution_date, **context):
        # Utilizing execution_date
    
    # After (Airflow 3.x)
    def my_task(logical_date, **context):
        # Utilizing logical_date

Subsequent, consider the utilization of the 2 scheduling-related default adjustments. catchup_by_default is now False, that means lacking DAG runs will not routinely backfill. If backfill is required, replace the DAG definition with catchup=True. In case your DAGs require backfill, it’s essential to take into account the influence of this migration and backfilling. Since you’re migrating a DAG to a clear atmosphere with no historical past, enabling backfilling will create DAG runs for all runs starting with the required start_date. Contemplate updating the start_date to keep away from pointless runs.

create_cron_data_intervals can be now False. With this alteration, cron expressions are evaluated as a CronTriggerTimetable assemble.

Lastly, consider the utilization of deprecated context variables for manually and Asset-triggered DAGs, then replace your code with appropriate replacements.

Updating necessities and testing

Along with doable bundle model adjustments, a number of core Airflow operators beforehand included within the airflow-core bundle moved to the apache-airflow-providers-standard bundle. These adjustments have to be integrated into your necessities.txt file. Specifying, or pinning, bundle variations in your necessities file is a greatest observe and really helpful for this migration.To replace your necessities file, full the next steps:

  1. Obtain and configure the Amazon MWAA Docker photos. For extra particulars, seek advice from the GitHub repo.
  2. Copy the present atmosphere’s necessities.txt file to a brand new file.
  3. If wanted, add the apache-airflow-providers-standard bundle to the brand new necessities file.
  4. Obtain the suitable Airflow constraints file on your goal Airflow model to your working director. A constraints file is obtainable for every Airflow model and Python model mixture. The URL takes the next type:
    https://uncooked.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt
  5. Create your versioned necessities file utilizing your un-versioned file and the constraints file. For steerage on making a necessities file, see Making a necessities.txt file. Make sure that there aren’t any dependency conflicts earlier than shifting ahead.
  6. Confirm your necessities file utilizing the Docker picture. Run the next command contained in the operating container:
    ./run.sh test-requirements

    Deal with any set up errors by updating bundle variations.

As a greatest observe, we suggest packaging your packages right into a ZIP file for deployment in Amazon MWAA. This makes positive the identical precise packages are put in on all Airflow nodes. Seek advice from Putting in Python dependencies utilizing PyPi.org Necessities File Format for detailed details about packaging dependencies.

Creating a brand new Amazon MWAA 3.x atmosphere

As a result of Amazon MWAA requires a migration strategy for main model upgrades, it’s essential to create a brand new atmosphere on your blue/inexperienced deployment. This publish makes use of the AWS Command Line Interface (AWS CLI) for example, you can too use infrastructure as code (IaC).

  1. Create a brand new S3 bucket utilizing the identical construction as the present S3 bucket.
  2. Add the up to date necessities file and any plugin packages to the brand new S3 bucket.
  3. Generate a template on your new atmosphere configuration:
    aws mwaa create-environment --generate-cli-skeleton > new-mwaa3-env.json

  4. Modify the generated JSON file:
    1. Copy configurations out of your current atmosphere.
    2. Replace the atmosphere identify.
    3. Set the AirflowVersion parameter to the goal 3.x model.
    4. Replace the S3 bucket properties with the brand new S3 bucket identify.
    5. Overview and replace different configuration parameters as wanted.

    Configure the brand new atmosphere with the identical networking settings, safety teams, and IAM roles as your current atmosphere. Seek advice from the Amazon MWAA Person Information for these configurations.

  5. Create your new atmosphere:
    aws mwaa create-environment --cli-input-json file://new-mwaa3-env.json

Metadata migration

Your new atmosphere requires the identical variables, connections, roles, and pool configurations. Use this part as a information for migrating this info. In the event you’re utilizing AWS Secrets and techniques Supervisor as your secrets and techniques backend, you don’t have to migrate any connections. Relying your atmosphere’s dimension, you may migrate this metadata utilizing the Airflow UI or the Apache Airflow REST API.

  1. Replace any customized pool info within the new atmosphere utilizing the Airflow UI.
  2. For environments utilizing the metadatabase as a secrets and techniques backend, migrate all connections to the brand new atmosphere.
  3. Migrate all variables to the brand new atmosphere.
  4. Migrate any customized Airflow roles to the brand new atmosphere.

Migration execution and validation

Plan and execute the transition out of your outdated atmosphere to the brand new one:

  1. Schedule the migration throughout a interval of low workflow exercise to attenuate disruption.
  2. Implement a freeze window for DAG adjustments earlier than and in the course of the migration.
  3. Execute the migration in phases:
    1. Pause DAGs within the outdated atmosphere. For a small variety of DAGs, you should use the Airflow UI. For bigger teams, think about using the REST API.
    2. Confirm all operating duties have accomplished within the Airflow UI.
    3. Redirect DAG triggers and exterior integrations to the brand new atmosphere.
    4. Copy the up to date DAGs to the brand new atmosphere’s S3 bucket.
    5. Allow DAGs within the new atmosphere. For a small variety of DAGs, you should use the Airflow UI. For bigger teams, think about using the REST API.
  4. Monitor the brand new atmosphere carefully in the course of the preliminary operation interval:
    1. Look ahead to failed duties or scheduling points.
    2. Test for lacking variables or connections.
    3. Confirm exterior system integrations are functioning appropriately.
    4. Monitor Amazon CloudWatch metrics to substantiate the atmosphere is performing as anticipated.

Publish-migration validation

After the migration, totally validate the brand new atmosphere:

  • Confirm that each one DAGs are being scheduled appropriately in accordance with their outlined schedules
  • Test that activity historical past and logs are accessible and full
  • Take a look at crucial workflows end-to-end to substantiate they execute efficiently
  • Validate connections to exterior programs are functioning correctly
  • Monitor CloudWatch metrics for efficiency validation

Cleanup and documentation

When the migration is full and the brand new atmosphere is secure, full the next steps:

  1. Doc the adjustments made in the course of the migration course of.
  2. Replace runbooks and operational procedures to mirror the brand new atmosphere.
  3. After a ample stability interval, outlined by stakeholders, decommission the outdated atmosphere:
    aws mwaa delete-environment --name old-mwaa2-env

  4. Archive backup information in accordance with your group’s retention insurance policies.

Conclusion

The journey from Airflow 2.x to three.x on Amazon MWAA is a chance to embrace next-generation workflow orchestration capabilities whereas sustaining the reliability of your workflow operations. By following these greatest practices and sustaining a methodical strategy, you may efficiently navigate this transition whereas minimizing dangers and disruptions to your small business operations.

A profitable migration requires thorough preparation, systematic testing, and sustaining clear documentation all through the method. Though the migration strategy requires extra preliminary effort, it supplies the protection and management wanted for such a big improve.


In regards to the authors

https://aws.amazon.com/blogs/big-data/best-practices-for-migrating-from-apache-airflow-2-x-to-apache-airflow-3-x-on-amazon-mwaa/Anurag Srivastava

Anurag Srivastava

Anurag works as a Senior Technical Account Supervisor at AWS, specializing in Amazon MWAA. He’s obsessed with serving to prospects construct scalable information pipelines and workflow automation options on AWS.

Kamen Sharlandjiev

Kamen Sharlandjiev

Kamen is a Sr. Huge Knowledge and ETL Options Architect, Amazon MWAA and AWS Glue ETL skilled. He’s on a mission to make life simpler for purchasers who’re dealing with advanced information integration and orchestration challenges. His secret weapon? Totally managed AWS providers that may get the job performed with minimal effort. Comply with Kamen on LinkedIn to maintain updated with the newest Amazon MWAA and AWS Glue options and information!

Ankit Sahu

Ankit Sahu

Ankit brings over 18 years of experience in constructing revolutionary digital services. His various expertise spans product technique, go-to-market execution, and digital transformation initiatives. At the moment, Ankit serves as Senior Product Supervisor at Amazon Internet Companies (AWS), the place he leads the Amazon MWAA service.

Jeetendra Vaidya

Jeetendra Vaidya

Jeetendra is a Senior Options Architect at AWS, bringing his experience to the realms of AI/ML, serverless, and information analytics domains. He’s obsessed with aiding prospects in architecting safe, scalable, dependable, and cost-effective options.

Mike Ellis

Mike Ellis

Mike is a Senior Technical Account Supervisor at AWS and an Amazon MWAA specialist. Along with aiding prospects with Amazon MWAA, he contributes to the Airflow open supply mission.

Venu Thangalapally

Venu Thangalapally

Venu is a Senior Options Architect at AWS, based mostly in Chicago, with deep experience in cloud structure, information and analytics, containers, and utility modernization. He companions with monetary service business prospects to translate enterprise objectives into safe, scalable, and compliant cloud options that ship measurable worth. Venu is obsessed with utilizing expertise to drive innovation and operational excellence. Exterior of labor, he enjoys spending time along with his household, studying, and taking lengthy walks.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

[td_block_social_counter facebook="tagdiv" twitter="tagdivofficial" youtube="tagdiv" style="style8 td-social-boxed td-social-font-icons" tdc_css="eyJhbGwiOnsibWFyZ2luLWJvdHRvbSI6IjM4IiwiZGlzcGxheSI6IiJ9LCJwb3J0cmFpdCI6eyJtYXJnaW4tYm90dG9tIjoiMzAiLCJkaXNwbGF5IjoiIn0sInBvcnRyYWl0X21heF93aWR0aCI6MTAxOCwicG9ydHJhaXRfbWluX3dpZHRoIjo3Njh9" custom_title="Stay Connected" block_template_id="td_block_template_8" f_header_font_family="712" f_header_font_transform="uppercase" f_header_font_weight="500" f_header_font_size="17" border_color="#dd3333"]
- Advertisement -spot_img

Latest Articles