The Amazon Redshift integration with AWS Lambda gives the aptitude to create Amazon Redshift Lambda user-defined capabilities (UDFs). This functionality delivers flexibility, enhanced integrations, and safety for capabilities outlined in Lambda that may be run by means of SQL queries. Amazon Redshift Lambda UDFs supply many benefits:
- Enhanced integration – You possibly can hook up with exterior providers or APIs from inside your UDF logic, enabling richer information enrichment and operational workflows.
- A number of Python runtimes – Lambda UDFs profit from Lambda operate help for a number of Python runtimes relying on particular use instances. As well as, the brand new variations and safety patches can be found inside a month of their official launch.
- Unbiased scaling – Lambda UDFs use Lambda compute assets, so heavy compute or memory-intensive duties don’t affect question efficiency or useful resource concurrency inside Amazon Redshift.
- Isolation and safety – You possibly can isolate customized code execution in a separate service boundary. This simplifies upkeep, monitoring, budgeting, and permission administration.
As a result of Lambda UDFs present these vital benefits in integration, flexibility, scalability, and safety, we will probably be ending help for Python UDFs in Amazon Redshift. We advocate that you just migrate your current Python UDFs to Lambda UDFs by June 30, 2026.
- October 30, 2025 – Creation of recent Python UDFs will not be supported (current capabilities can nonetheless be invoked)
- June 30, 2026 – Execution of current Python UDFs will probably be suspended
On this put up, we stroll you thru how one can migrate your current Python UDFs to Lambda UDFs, arrange monitoring and value evaluations, and evaluate key issues for a clean transition.
Answer overview
You possibly can create UDFs for duties akin to tokenization, encryption and decryption, or information science performance just like the Levenshtein distance calculation. For this put up, we offer examples for patrons who’ve Python UDFs in place, demonstrating how one can exchange them with Lambda UDFs.
The Levenshtein operate, also called the Levenshtein distance or edit distance, is a string metric used to measure the distinction between two sequences of characters. Though this performance was beforehand applied utilizing Python UDFs utilizing the Python library in Amazon Redshift, Lambda gives a extra environment friendly and scalable answer. This put up demonstrates how one can migrate from Python UDFs to Lambda UDFs for calculating Levenshtein distances.
Stipulations
You have to have the next:
Put together the information
To arrange our use case, full the next steps:
- On the Amazon Redshift console, select Question editor v2 underneath Explorer within the navigation pane.
- Connect with your Redshift information warehouse.
- Create a desk and cargo information. The next question masses 30,000,000 rows within the
buyer
desk:
Establish current Python UDFs
Run the next script to checklist current Python UDFs:
The next is our current Python UDF definition for Levenshtein distance:
Convert the Python UDF operate to a Lambda UDF
You possibly can simplify changing your Python UDF to a Lambda UDF utilizing Amazon Q Developer, a generative AI-powered assistant. It handles code transformation, packaging, and integration logic, accelerating migration and enhancing scalability. Built-in with in style developer instruments like VS Code, JetBrains, and others, Amazon Q streamlines workflows so groups can modernize analytics utilizing serverless architectures with minimal effort.
Amazon Q Developer code recommendations are primarily based on giant language fashions (LLMs) skilled on billions of strains of code, together with open supply and Amazon code. All the time evaluate a code suggestion earlier than accepting it, and also you would possibly have to edit it to guarantee that it does precisely what you meant.
Create a Lambda operate
Full the next steps to create a Lambda operate:
- On the Lambda console, select Features within the navigation pane.
- Select Create operate.
- Select Creator from scratch.
- For Perform identify, enter a customized identify (for instance,
levenshtein_distance_func
). - For Runtime, select your code atmosphere. (The examples on this put up are suitable with Python 3.12.)
- For Structure, choose your system structure. (The examples on this put up are suitable with x86_64.)
- For Execution function, choose Create a brand new function with primary Lambda permissions.
- Select Create operate.
- Select Code and add the next code:
- Select configuration and replace Timeout to 1 minute.
You possibly can modify reminiscence to optimize efficiency. To be taught extra, see Optimizing Levenshtein Consumer-Outlined Perform in Amazon Redshift.
Create an Amazon Redshift IAM function
To permit your Amazon Redshift cluster to invoke the Lambda operate, you should arrange correct IAM permissions. Full the next steps:
- Establish the IAM function related along with your Amazon Redshift cluster. Should you don’t have one, create a brand new IAM function for Amazon Redshift.
- Add the next IAM coverage to this function, offering your AWS Area and AWS account quantity:
Create a Lambda UDF
Run following script to create your Lambda UDF:
Check the answer
To check the answer, run the next script utilizing the Python UDF:
The next desk reveals our output.
Run the identical script utilizing the Lambda UDF:
The outcomes of each UDFs match.
Change the Python UDF with the Lambda UDF
You need to use the next steps in preproduction for testing:
- Revoke entry for the Python UDF:
- Grant entry to the Lambda UDF:
- After full testing of the Lambda UDF has been carried out, you’ll be able to drop the Python UDF.
- Rename the Lambda UDF
fn_lambda_levenshtein_distance
tofn_levenshtein_distance
so the end-user and software code doesn’t want to alter:
- Validate with the next question:
Value analysis
To guage the price of the Lambda UDF, full the next steps:
- Run the next script to create a desk utilizing a SELECT question, which makes use of the Lambda UDF:
You possibly can examine the question logs utilizing CloudWatch Log Insights.
- On the CloudWatch console, select Logs within the navigation pane, then select Log Insights.
- Filter by the Lambda UDF and use the next question to establish the variety of Lambda invocations.
- Use following question to seek out the price of the Lambda UDF for the particular period you chose:
For this instance, we used the us-east-1
Area utilizing ARM-based cases. For extra particulars on Lambda pricing by Area and the Free Tier restrict, see AWS Lambda pricing.
- Select Summarize outcomes.
The price of this Lambda UDF invocation was $0.02329 for 30 million rows.
Monitor Lambda UDFs
Monitoring Lambda UDFs includes monitoring each the Lambda operate’s efficiency and the affect on the Redshift question execution. As a result of UDFs execute externally, a twin method is critical.
CloudWatch metrics and logs for Lambda capabilities
CloudWatch gives complete monitoring for Lambda capabilities, akin to the next key metrics:
- Invocations – Tracks the variety of occasions the Lambda operate is named, indicating UDF utilization frequency
- Period – Measures execution time, serving to establish efficiency bottlenecks
- Errors – Counts failed invocations, which is important for detecting points in UDF logic
- Throttles – Signifies when Lambda limits invocations because of concurrency caps, which may delay question outcomes
- Logs – CloudWatch Logs seize detailed execution output, together with errors and customized log messages, aiding in debugging
- Alarms – Configures alarms for prime error charges (for instance, Errors > 0) or extreme period (for instance, Period > 1 second) to obtain proactive notifications
Redshift question efficiency
Inside Amazon Redshift, system views present complete insights into Lambda UDF efficiency and errors:
- SYS_QUERY_HISTORY – Identifies queries which have referred to as your Lambda UDFs by filtering with the UDF identify within the
query_text
column. This helps monitor utilization patterns and execution frequency. - SYS_QUERY_DETAIL – Gives granular execution metrics for queries involving Lambda UDFs, serving to establish efficiency bottlenecks on the step degree.
- Efficiency aggregation – Generates abstract studies of Lambda UDF efficiency metrics, together with execution rely, common period, and most period to trace efficiency developments over time.
The next desk summarizes the monitoring instruments obtainable.
Monitoring Software | Goal | Key Metrics/Views |
CloudWatch Metrics | Monitor Lambda operate efficiency | Invocations, Period, Errors, Throttles |
CloudWatch Logs | Debug Lambda execution points | Error messages, customized logs |
SYS_QUERY_HISTORY | Monitor Lambda UDF utilization patterns | Question execution occasions, standing, person data, question textual content |
SYS_QUERY_DETAIL | Analyze Lambda UDF efficiency | Step-level execution particulars, useful resource utilization, question plan data |
Efficiency Abstract Stories | Monitor UDF efficiency developments | Execution rely, common/most period, whole elapsed time |
Monitoring method for Lambda UDFs in Amazon Redshift
For analyzing particular person queries, you should utilize the next code to trace how your Lambda UDFs are getting used throughout your group:
This helps you do the next:
- Establish frequent customers
- Monitor execution patterns
- Monitor utilization developments
- Detect unauthorized entry
You can too create complete monitoring by utilizing question historical past to observe efficiency metrics on the person degree:
Moreover, you’ll be able to generate weekly efficiency studies utilizing the next aggregation question:
Issues
To maximise the advantages of Lambda UDFs, take into account the next elements to optimize efficiency, present reliability, safe information, and handle prices. When you have Python UDFs that don’t use Python libraries, take into account whether or not they’re candidates to transform to SQL UDFs.
The next are key efficiency issues:
- Batching – Amazon Redshift batches a number of rows right into a single Lambda invocation to cut back name frequency, enhancing effectivity. Be sure that the Lambda operate handles batched inputs effectively. For extra particulars, see Accessing exterior elements utilizing Amazon Redshift Lambda UDFs.
- Parallel invocations – Redshift cluster slices invoke Lambda capabilities in parallel, enhancing efficiency for giant datasets. Design capabilities to help concurrent executions.
- Chilly begins – Lambda capabilities would possibly expertise chilly begin delays, notably if sometimes used. Languages like Python or Node.js sometimes have sooner startup occasions than Java, lowering latency.
- Perform optimization – Optimize Lambda code for fast execution, minimizing useful resource utilization and latency. For instance, keep away from pointless computations or exterior API calls.
Contemplate the next error dealing with strategies:
- Strong lambda logic – Implement complete error dealing with within the Lambda operate to handle exceptions gracefully. Return clear error messages within the JSON response, as specified within the Amazon Redshift-Lambda interface. For extra particulars, see Scalar Lambda UDFs.
- Error propagation – Lambda errors could cause Redshift question failures. Monitor
SYS_QUERY_HISTORY
for query-level points and CloudWatch Logs for detailed Lambda errors. - JSON interface – The Lambda operate should return a JSON object with
success
,error_msg
,num_records
, andoutcomes
fields. Use correct formatting to keep away from question disruptions.
Clear up
Full the next steps to scrub up your assets:
- Delete the Redshift provisioned or serverless endpoint.
- Delete the Lambda operate.
- Delete the IAM roles you created.
Conclusion
Lambda UDFs unlock a brand new degree of flexibility, efficiency, and maintainability for extending Amazon Redshift. By decoupling customized logic from the warehouse engine, groups can scale independently, undertake fashionable runtimes, and combine exterior programs.
Should you’re presently utilizing Python UDFs in Amazon Redshift, it’s time to discover the advantages of migrating to Lambda UDFs. With the generative AI capabilities of Amazon Q Developer, you’ll be able to automate a lot of this transformation and speed up your modernization journey. To be taught extra, seek advice from the Lambda UDF examples GitHub repo and Information Tokenization with Amazon Redshift and Protegrity.
In regards to the authors
Raks Khare is a Senior Analytics Specialist Options Architect at AWS primarily based out of Pennsylvania. He helps prospects throughout various industries and areas architect information analytics options at scale on the AWS platform. Exterior of labor, he likes exploring new journey and meals locations and spending high quality time together with his household.
Ritesh Kumar Sinha is an Analytics Specialist Options Architect primarily based out of San Francisco. He has helped prospects construct scalable information warehousing and large information options for over 16 years. He likes to design and construct environment friendly end-to-end options on AWS. In his spare time, he loves studying, strolling, and doing yoga.
Yanzhu Ji is a Product Supervisor within the Amazon Redshift group. She has expertise in product imaginative and prescient and technique in industry-leading information merchandise and platforms. She has excellent talent in constructing substantial software program merchandise utilizing net growth, system design, database, and distributed programming methods. In her private life, Yanzhu likes portray, images, and enjoying tennis.
Harshida Patel is a Analytics Specialist Principal Options Architect, with AWS.