9.7 C
Canberra
Thursday, June 4, 2026

Schedule pocket book runs in Amazon SageMaker Unified Studio


In case you construct notebooks for recurring duties corresponding to every day buyer evaluation, weekly report technology, or information high quality checks in Amazon SageMaker Unified Studio, you’ve doubtless wished to run them robotically on a schedule. Till now, there wasn’t a local method to do that. Groups needed to handle orchestration individually, though the interactive pocket book expertise was already in place. Now, pocket book scheduling is obtainable, so you’ll be able to configure your manufacturing workloads to run robotically with minimal guide intervention.

On this publish, we stroll you thru the brand new scheduling and orchestrating capabilities for notebooks in Amazon SageMaker Unified Studio. You’ll discover ways to:

  • Set off on-demand background runs, corresponding to a mannequin re-training job, with out ready at your desk.
  • Create recurring schedules for duties corresponding to nightly information freshness checks or weekly enterprise evaluations.
  • Parameterize notebooks so a single template can generate studies throughout completely different AWS Areas or buyer segments.
  • Orchestrate multi-notebook workflows the place one pocket book’s output feeds into the subsequent. For instance, an extract, rework, and cargo (ETL) pipeline adopted by a abstract dashboard refresh.
  • Debug failed runs with AI-assisted troubleshooting.

Pattern use case overview

On this walkthrough, you’ll tackle the position of a logistics analyst who displays delivery efficiency throughout carriers. The pocket book hundreds delivery information from the ShippingLogs.csv dataset, identifies late deliveries, and generates a efficiency abstract. You need to run this pocket book each morning with out guide intervention, reuse it throughout completely different carriers, and know when one thing goes improper.

You’ll begin by operating a pocket book within the background and viewing the outcomes. Subsequent, you’ll create a recurring schedule for every day runs, then parameterize the pocket book to generate studies for various carriers. Additionally, you will orchestrate the pocket book in a multi-step workflow and debug a failed run utilizing AI-assisted troubleshooting.

Conditions

Earlier than you start, you want:

  • An Amazon SageMaker Unified Studio challenge with Notebooks enabled. See Arrange IAM-based domains for permission necessities.
  • A pattern dataset. We use the ShippingLogs.csv dataset, which accommodates delivery information together with estimated and precise supply occasions, carriers, and origins. You may obtain it from the Workshop Studio (the file is called ShippingLogs.csv on the linked web page).

Establishing the pocket book

Begin by creating a brand new pocket book in your SageMaker Unified Studio challenge. In case you haven’t already, add the ShippingLogs.csv file below the Shared tab within the Information panel.

SageMaker Unified Studio Notebook Files panel showing the Shared tab with the ShippingLogs.csv dataset uploaded

Within the first cell, we load and discover the dataset. To reference the file in code, choose the file within the Shared tab and replica the Amazon Easy Storage Service (Amazon S3) URI proven within the file particulars. Alternatively, you’ll be able to reference it with this code:

import pandas as pd
from sagemaker_studio import Undertaking

# Initialize the challenge
proj = Undertaking()

# Get the S3 root path
s3_root = proj.s3.root

df = pd.read_csv(s3_root + '/ShippingLogs.csv')
df.head()

The dataset accommodates columns together with Service, ActualShippingDays, ExpectedShippingDays, ShippingOrigin, ShippingPriority, and OnTimeDelivery. Add a second cell to investigate delivery efficiency for a single provider:

import matplotlib.pyplot as plt

carrier_data = df[df['Carrier'] == 'GlobalFreight']
# Flag late deliveries
carrier_data['is_late'] = carrier_data['ActualShippingDays'] > carrier_data['ExpectedShippingDays']
late_pct = carrier_data['is_late'].imply() * 100
# Visualize precise vs anticipated delivery days
plt.determine(figsize=(12, 4))
plt.hist(carrier_data['ActualShippingDays'] - carrier_data['ExpectedShippingDays'], bins=20, edgecolor="black")
plt.axvline(x=0, shade="purple", linestyle="--", label="On time")
plt.title(f'Transport Delay Distribution - GlobalFreight ({late_pct:.1f}% late)')
plt.xlabel('Days Over Anticipated')
plt.ylabel('Variety of Shipments')
plt.legend()
plt.present()

With the pocket book working interactively, you’re able to automate it.

Working a pocket book asynchronously

To set off an asynchronous run, open your pocket book. Within the pocket book header, select the menu on the Run all button, after which select Run in background.

Notebook header with the Run all menu expanded, showing the Run in background option

This captures a snapshot of the pocket book in its present state and begins a run on a separate devoted compute. You may proceed engaged on different duties or shut the browser completely. Your interactive session isn’t affected.

You will notice a notification on the backside of your display confirming that the run began. To examine the standing of your run, select View Run within the notification. This opens a view displaying each background and scheduled run with its standing, length, and a hyperlink to view the total output.

Run history view showing background and scheduled runs with status, duration, and output links

You may select to view the run particulars at any level to view outcomes as cells run. The run particulars embrace three tabs:

  • Output: The pocket book in read-only mode with cell outcomes rendered, together with dataframe outputs, visualizations, and print statements.
  • Parameters: The parameter values used for this run.
  • Logs: Run logs for debugging.

Run details view showing the Output, Parameters, and Logs tabs with rendered cell output

You may also entry previous runs by deciding on the View Runs possibility within the pocket book header.

Notebook header with the View Runs option highlighted

Stopping an in-progress run

If you could cancel a run, open the run, and select Cease. The run terminates, and its standing updates to replicate the cancellation.

Run detail view with the Stop button selected to terminate an in-progress run

What to find out about background runs

Compute: Every background run makes use of its personal devoted compute, separate out of your interactive session. Your interactive work isn’t interrupted.

Packages: The packages that you simply set up via the pocket book’s bundle supervisor might be obtainable in your background runs. Whenever you use !pip set up in code cells, the asynchronous run installs these packages as nicely.

Native recordsdata: Background runs can’t entry recordsdata saved regionally in your pocket book atmosphere. Reference information out of your challenge’s shared storage (Amazon S3) or related information sources as a substitute.

Startup time: Count on a couple of minutes of startup time whereas compute is provisioned and your atmosphere is ready.

Making a recurring schedule

Now that you simply’ve confirmed asynchronous runs work appropriately, you’ll be able to automate the pocket book on a schedule. Select the schedule icon within the pocket book header to open the schedule creation type.

Schedule creation form opened from the notebook header schedule icon

Configure the next settings:

  • Schedule title: Enter a descriptive title, corresponding to Each day Transport Report.
  • Schedule kind: Select Recurring for repeated runs or One-time for a single future run.
  • Frequency: Outline how typically the pocket book runs utilizing a charge (for instance, each in the future) or a cron expression. Set the time zone and the beginning and finish dates for the schedule. For instance, set the schedule to run day-after-day at 7:00 AM UTC beginning tomorrow.
  • Versatile time window (elective): The variety of minutes after the scheduled begin time inside which the run could be invoked. For instance, with a 5-minute window, the pocket book runs inside 5 minutes of the beginning time.
  • Superior settings:
    • Compute Occasion: Maintain the present settings or override with a special occasion kind for the asynchronous run to make use of.
    • Timeout: Set a most run length to assist stop notebooks from operating indefinitely. If left clean, it defaults to 60 minutes.

Select Create.

Configured schedule form with name, recurring type, daily frequency, and advanced settings populated

The schedule seems within the Schedules tab of the exercise panel. SageMaker Unified Studio creates an Amazon EventBridge Scheduler schedule for every schedule you configure.

Schedules tab in the activity panel listing the newly created Daily Shipping Report schedule

Viewing schedule run historical past

To view previous runs for a schedule, select the schedule title within the Schedules exercise panel. This opens the schedule particulars view, the place you’ll be able to see the record of runs triggered by that schedule, the length of every run, and a hyperlink to open the pocket book output for a person run.

Schedule details view showing the list of past runs with status, duration, and output links

Modifying and deleting schedules

To change a schedule, select Edit subsequent to it within the Schedules panel. You may change the frequency, occasion kind, timeout, and different configuration fields. To pause or resume a schedule, select Pause or Resume from the identical menu. To take away a schedule, select Delete from that menu. Deleting a schedule stops future runs however preserves historic run outputs in Amazon S3 for auditing functions.

Schedules panel with the Edit, Pause, Resume, and Delete options for a schedule

Parameterizing notebooks

With parameters, you’ll be able to reuse a single pocket book throughout completely different inputs with out duplicating code. For instance, you’ll be able to run the identical delivery efficiency report for every provider by passing a special provider title to every run.

Defining parameters

Open the Parameters exercise panel and select Add. Set the parameter title to provider and the default worth to GlobalFreight.

Parameters activity panel with the carrier parameter and GlobalFreight default value configured

Utilizing parameters in code

In your pocket book, substitute the second cell with the next code. This retrieves the provider parameter worth utilizing the SageMaker Unified Studio Python SDK as a substitute of the hardcoded worth:

import sagemaker_studio
import matplotlib.pyplot as plt

provider = sagemaker_studio.nbutils.parameters.get("provider")

carrier_data = df[df['Carrier'] == provider].copy()
carrier_data['is_late'] = carrier_data['ActualShippingDays'] > carrier_data['ExpectedShippingDays']
late_pct = carrier_data['is_late'].imply() * 100

plt.determine(figsize=(12, 4))
plt.hist(carrier_data['ActualShippingDays'] - carrier_data['ExpectedShippingDays'], bins=20, edgecolor="black")
plt.axvline(x=0, shade="purple", linestyle="--", label="On time")
plt.title(f'Transport Delay Distribution - {provider} ({late_pct:.1f}% late)')
plt.xlabel('Days Over Anticipated')
plt.ylabel('Variety of Shipments')
plt.legend()
plt.present()

Creating schedules with completely different parameter values

Now create three schedules for a similar pocket book, every focusing on a special provider:

  • “daily-shipping-gf” with provider = GlobalFreight.
  • “daily-shipping-mc” with provider = MicroCarrier.
  • “daily-shipping-shipper” with provider = Shipper.

Whenever you view a historic run, a separate Parameters tab within the run output shows the parameter values that have been energetic for that run.

You may also override parameter values when triggering an on-demand background run. Select the menu on the Run all button, then select Run with settings. You may maintain the defaults or present customized values for that run.

Orchestrating with Workflows

To mix notebooks right into a multi-step pipeline, corresponding to operating an information calculation pocket book earlier than the delivery log pocket book, you should use the Pocket book Operator within the Workflows instrument to orchestrate them.

To do that, select the Add to workflows button below the choices menu of the pocket book header.

Notebook header options menu with the Add to workflows button highlighted

This takes you to the Workflows instrument, including a brand new Pocket book Operator process with prefilled properties out of your pocket book. When configuring the Operator process:

  • Choose the goal pocket book from the pocket book menu.
  • Use the Parameters widget to move pocket book parameters into the run of the pocket book.
  • Specify elective arguments such because the compute occasion and timeout configuration for the run.

Workflows canvas with a Notebook Operator task configured with notebook, parameters, and compute settings

Workflows additionally helps polling for the standing of a pocket book run for a specific pocket book utilizing Pocket book Sensor. In Workflows, you’ll be able to add a brand new Sensor process by hovering on the sting of the prevailing Operator process, the place a plus (+) button is displayed.

Workflows canvas showing the plus button on the edge of an Operator task for adding a Sensor

You may then seek for and add the Pocket book Sensor to the canvas.

Task picker dialog with Notebook Sensor selected for adding to the workflow canvas

When configuring the Sensor process, specify the pocket book run ID inside the textual content discipline. The Operator’s type discipline accommodates Jinja templating to retrieve the pocket book run. If the Sensor is used inside the identical workflow because the Operator, this template could be copied to make use of inside a Sensor to ballot the pocket book run. Choose the goal pocket book from the pocket book menu.

Notebook Sensor configuration panel with the notebook run ID field populated using Jinja templating

Inside Workflows, you’ll be able to configure pocket book runs to emit outputs and use these outputs as inputs for subsequent pocket book runs.

Constructing off of the earlier delivery log pocket book instance, we are going to move the provider parameter from an upstream pocket book’s output. Your shipping-logs-analysis pocket book must be already arrange.

As a result of the pocket book is dependent upon the provider parameter, you’ll be able to specify it within the Parameters panel.

Parameters panel for the shipping-logs-analysis Operator with the carrier parameter dependency configured

Now, outline a second pocket book, calculate-best-carrier, which performs a calculation to find out our greatest provider to make use of for delivery:

import pandas as pd
from sagemaker_studio import Undertaking

# Initialize the challenge
proj = Undertaking()

# Get the S3 root path
s3_root = proj.s3.root

df = pd.read_csv(s3_root + '/ShippingLogs.csv')
df.head()

carrier_stats = df.groupby('Service').agg(
    complete=('OrderID', 'depend'),
    late=('OnTimeDelivery', lambda x: (x == 'Late').sum())
).reset_index()
carrier_stats['late_pct'] = carrier_stats['late'] / carrier_stats['total'] * 100

finest = carrier_stats.sort_values('late_pct', ascending=True).iloc[0]
best_carrier = finest['Carrier']

print("Late % by provider:")
print(carrier_stats.to_string(index=False))
print(f"nBest provider: {best_carrier} ({finest['late_pct']:.1f}% late)")

To configure the calculate-best-carrier pocket book’s outputs, you’ll be able to select the Variables panel. A brand new selector is obtainable on the backside of this panel which lets you choose variables to mark as outputs.

Variables panel with the selector at the bottom for marking notebook variables as outputs

We would like this pocket book to emit the best_carrier variable.

Variables panel showing best_carrier marked as an output variable for the calculate-best-carrier notebook

Now, use the Add to workflows button as beforehand demonstrated to shortly add this pocket book inside a workflow. Chain a second Pocket book Operator that factors to our shipping-logs-analysis pocket book. As a result of we specified a parameter dependency on provider for this pocket book, it’s obtainable as an possibility within the Parameters widget menu.

Parameters widget menu of a Notebook Operator showing carrier as a configurable parameter dependency

Once they’re chained, the pocket book duties detect the outputs set in upstream pocket book runs. These outputs could be chosen as keys inside the Parameters widget of the Operator to move into the run. This may be carried out recursively for an arbitrary variety of Operator duties. We are able to choose the emitted best_carrier output from the calculate-best-carrier pocket book.

Parameters widget displaying best_carrier as a selectable upstream output to pass into the next Operator

Now you can select the Save button on the highest left of the visible canvas and the Run button to begin the workflow. When the workflow is accomplished, the required pocket book outputs can be found within the Process Output panel and the pocket book run end result could be considered within the Notebooks instrument.

Task Output panel showing the emitted notebook outputs after a successful workflow run

Notebook run result rendered in the Notebooks tool after the chained workflow completes

In the same method, the Pocket book Sensor will even emit the pocket book outputs from a specific pocket book’s run which can be utilized inside different duties. That is helpful while you need to retrieve outputs from a pocket book run in one other workflow.

Debugging a failed run with AI help

When viewing your previous runs, you discover {that a} run from earlier as we speak has a Failed standing. Select the failed run to open the pocket book output in read-only mode.

On this instance, suppose you incorrectly referred to column title ActualShippingDays as DeliveryDays. The run would fail with a KeyError: 'DeliveryDays' within the cell that computes late deliveries.

On the high of the failed run output, select Troubleshoot with AI. Selecting the Troubleshoot with AI button lands you within the pocket book with the Agent chat panel open.

Failed run output with the Troubleshoot with AI button highlighted at the top of the page

The information agent analyzes the cell outputs, identifies the cell that errored, explains the foundation trigger, and suggests a repair. On this case, it identifies that the column DeliveryDays doesn’t exist within the dataframe and suggests updating the code reference. You may evaluate the change, then confirm the repair by selecting Run in background from the Run all menu to set off a take a look at run earlier than the subsequent scheduled run.

Observe: You may also use the Knowledge Agent to create schedules and begin pocket book runs utilizing pure language, with out having to navigate.

Cleansing up

To keep away from incurring future costs, delete the assets that you simply created on this walkthrough:

  • Delete any schedules that you simply created from the Schedules panel in your pocket book.
  • Delete take a look at notebooks in the event you don’t want them.
  • Navigate to the Workflows web page and delete any workflows that you simply created throughout this walkthrough.
  • Your challenge’s Amazon S3 storage retains historic run outputs till you manually take away them.

Conclusion

On this publish, we confirmed the best way to run notebooks within the background in Amazon SageMaker Unified Studio utilizing background runs, schedules, parameterization, workflow orchestration, and AI-assisted debugging. Utilizing a delivery logistics dataset, we demonstrated how a single pocket book could be parameterized to generate efficiency studies for various carriers on impartial schedules, all with out duplicating code or managing in depth infrastructure.

To get began, open a pocket book in your SageMaker Unified Studio challenge, select the menu on the Run all button within the pocket book header, and select Run in background. For extra superior use circumstances, discover workflows in Amazon SageMaker Unified Studio to construct multi-step information pipelines, or evaluate the Amazon SageMaker Unified Studio Consumer Information for extra configuration choices.

Be taught extra:

In case you have suggestions or questions, attain out on AWS re:Submit for Amazon SageMaker Unified Studio.


Concerning the authors

Shivani Mehendarge

Shivani Mehendarge

Shivani is a Software program Improvement Engineer at Amazon Internet Providers, the place she builds scalable infrastructure that helps information groups run and automate their workloads in Amazon SageMaker Unified Studio. She is keen about fixing advanced distributed methods challenges and constructing dependable cloud providers.

Regan Perk

Regan Perk

Regan is a Senior Software program Improvement Engineer on the Amazon SageMaker Unified Studio workforce. She designs, implements, and maintains options that allow prospects to handle schedules and workflows in SageMaker Unified Studio.

Qazi Ashikin

Qazi Ashikin

Qazi is a Software program Improvement Engineer at Amazon Internet Providers, the place he works on creating options that enable prospects to orchestrate workflows and schedules in SageMaker Unified Studio. He additionally works on AWS Glue Studio, the place he builds agentic methods and maintains providers that allow information analytics.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

[td_block_social_counter facebook="tagdiv" twitter="tagdivofficial" youtube="tagdiv" style="style8 td-social-boxed td-social-font-icons" tdc_css="eyJhbGwiOnsibWFyZ2luLWJvdHRvbSI6IjM4IiwiZGlzcGxheSI6IiJ9LCJwb3J0cmFpdCI6eyJtYXJnaW4tYm90dG9tIjoiMzAiLCJkaXNwbGF5IjoiIn0sInBvcnRyYWl0X21heF93aWR0aCI6MTAxOCwicG9ydHJhaXRfbWluX3dpZHRoIjo3Njh9" custom_title="Stay Connected" block_template_id="td_block_template_8" f_header_font_family="712" f_header_font_transform="uppercase" f_header_font_weight="500" f_header_font_size="17" border_color="#dd3333"]
- Advertisement -spot_img

Latest Articles