Amazon Redshift is a quick, scalable, safe, and totally managed cloud knowledge warehouse that allows you to analyze your knowledge at scale. Amazon Redshift Serverless allows you to entry and analyze knowledge with out the same old configurations of a provisioned knowledge warehouse. Assets are routinely provisioned and knowledge warehouse capability is intelligently scaled to ship quick efficiency for even essentially the most demanding and unpredictable workloads. If you happen to desire to handle your Amazon Redshift sources manually, you’ll be able to create provisioned clusters in your knowledge querying wants. For extra data, seek advice from Amazon Redshift clusters.
Amazon Redshift offers efficiency metrics and knowledge so you’ll be able to monitor the well being and efficiency of your provisioned clusters, serverless workgroups, and databases. The efficiency knowledge you should utilize on the Amazon Redshift console falls into two classes:
- Amazon CloudWatch metrics – Helps you monitor the bodily facets of your cluster or serverless, equivalent to useful resource utilization, latency, and throughput.
- Question and cargo efficiency knowledge – Helps you monitor database exercise, examine and diagnose question efficiency issues.
Amazon Redshift has launched a brand new function referred to as the Question profiler. The Question profiler is a graphical software that helps customers analyze the elements and efficiency of a question. This function is a part of the Amazon Redshift console and offers a visible and graphical illustration of the question’s run order, execution plan, and numerous statistics. The Question profiler makes it simpler for customers to grasp and troubleshoot their queries.
On this submit, we cowl two widespread use instances for troubleshooting question efficiency. We present you step-by-step how you can analyze and troubleshoot long-running queries utilizing the Question profiler.
Overview
For Amazon Redshift Serverless, the Question profiler may be accessed by going to the Serverless console. Select Question and database monitoring, choose a question, after which navigate to the Question plan tab. If a question plan is offered, you’ll observe an inventory of kid queries. Select a question to view it in Question profiler.
For Amazon Redshift provisioned, the Question profiler may be accessed by going to the provisioned clusters dashboard. Select Question and masses, and select a question. Navigate to the Question plan tab. If a question plan is offered, you’ll observe an inventory of kid queries. Select a question to view it in Question profiler.
Stipulations
- You should utilize the next pattern AWS Identification and Entry Administration (IAM) coverage to configure your IAM consumer or function with minimal privileges to entry Question profiler from the AWS console. In case your IAM consumer or function already has entry to Question and masses part of Redshift provisioned cluster dashboard or Question and database monitoring part of Redshift serverless dashboard, then no further permissions are wanted:
- You’ll be able to select to make use of Question profiler in your account with an current Amazon Redshift knowledge warehouse and queries. Nevertheless, if you need to implement this demo in your current Amazon Redshift knowledge warehouse, obtain Redshift question editor v2 pocket book, Redshift Question profiler demo, and seek advice from the Information Loading part later on this submit.
- It’s essential to hook up with the cluster utilizing database credentials and grant the
sys:operatororsys:monitorfunction to the database consumer to view queries run by customers.
Information loading
Amazon Redshift Question Editor v2 comes with pattern knowledge that may be loaded right into a pattern database and corresponding schema. To check Question profiler in opposition to the pattern knowledge, load the tpcds pattern knowledge and run queries.
- To load the tpcds pattern knowledge, launch Redshift question editor v2 and develop the database
sample_data_dev. - Select the icon related to the
tpcds. - The question editor v2 then masses the info right into a schema tpcds within the database sample_data_dev.
The next screenshot reveals these steps.
- Confirm the info by operating the next pattern question, as proven within the following screenshot.
Use instances
On this submit, we describe two widespread makes use of instances round question efficiency and how you can use Question profiler to troubleshoot the efficiency points:
- Nested loop joins – This be part of kind is the slowest of the potential be part of varieties. Nested loop joins are the cross-joins and not using a be part of situation that outcome within the Cartesian product of two tables.
- Suboptimal knowledge distribution – If knowledge distribution is suboptimal, you would possibly discover a big broadcast or redistribution of information throughout compute nodes when two massive tables are joined collectively.
Use case 1: Nested loop joins
To troubleshoot efficiency points with nest loop joins utilizing Question profiler, observe these steps:
- Import pocket book downloaded beforehand in conditions part of the weblog into Redshift question editor v2.
- Set the context of database to
sample_data_devin Question Editor v2, as proven within the following screenshot.
- Run
cell #3from demo pocket book to diagnose a question efficiency concern associated to nested loop joins.
- Run
cell #5to seize the question id from the SYS_QUERY_HISTORY system view filtering based mostly on the question label you set within the previous step.
- On the Amazon Redshift console, within the navigation pane, choose Question and masses and select the cluster title the place the question was initially executed, as proven within the following screenshot.

- This can open the brand new Question profiler. Underneath the Question historical past part, select
Hook up with database.After profitable connection to the database, you’ll observe the Standing exhibiting asLinkedand displaying the question historical past, as proven within the following screenshot.
- Yow will discover your queries both by Question ID or Course of ID. Enter the Question ID captured within the previous step to filter the long-running question for additional evaluation and select the corresponding Question ID, as proven within the following screenshot.

- Underneath the Question plan part, select
Little one question 1, as proven within the following screenshot. If there are a number of little one queries, you’ll have to examine every one for efficiency points.
This can open the question plan in a tree view together with further metrics on the facet panel. This lets you rapidly analyze the question streams, segments and steps. For extra details about streams, segments, and steps, seek advice from Question planning and execution workflow within the Amazon Redshift Database Developer Information. - Activate View streams and, within the Streams facet panel, examine and determine which stream has the very best execution time. On this case, Streams ID 5 is the place the question spends nearly all of time, as proven within the following screenshot

- Within the Streams facet panel, below ID, choose 5 to deal with Stream 5 for additional evaluation. Stream 5 reveals a step of Nestloop, as proven within the following screenshot.

- Select the Nestloop step to additional analyze. The facet panel will change with step particulars and extra metrics in regards to the nested loop be part of.
- By taking a look at Step particulars – nestloop, we will examine the Enter rows and evaluate that with the Output rows, as proven within the following screenshot. On this case, because of the cross-joining with the
Store_returnsdesk, 287,514 enter rows explodes to 950,233,770 rows, thus inflicting our question to run slower.
- Repair the question by introducing a be part of situation between the
store_salesandstore_returns. Runcell #7from Question editor v2 demo pocket book.The re-written question runs in simply 307 milliseconds.
Use case 2: Suboptimal knowledge distribution
- To reveal suboptimal knowledge distribution, change the distribution type of tables
web_salesandweb_returnsto EVEN by operatingcell #10of Question editor v2 demo pocket book.
- Run
cell #12. The question takes 409 milliseconds to run, as proven by the elapsed time within the following screenshot of the Question editor v2.
- Observe steps 3–10 from use case 1 to find the
query_idand to open the Question profiler view for the previous question. - On the Question profiler web page for the previous question, activate View streams. Within the Streams facet panel, examine and determine which stream has the very best execution time. On this case, Stream ID 6 is the place the question spends a majority of the time, as proven within the following screenshot.

- Underneath ID, choose 6 from the Streams facet panel for additional evaluation.

Stream 6 reveals a step of hash be part of, which includes a hash be part of of two tables which might be each redistributed. This may be inferred from Hash Proper Be a part of DS_DIST_BOTH below Clarify plan node data within the following screenshot. Often, these redistributions happen as a result of the tables aren’t joined on their distribution keys, or they don’t have the proper distribution type. Within the case of huge tables, these redistributions can result in important efficiency degradation and, therefore, you will need to determine and repair such steps to optimize question efficiency.
- Repair this suboptimal knowledge distribution sample by selecting the suitable distribution keys on the tables concerned:
web_salesandweb_returns. To vary the distribution kinds, runcell #14of demo pocket book to change desk instructions.
- After the previous instructions end operating, run
cell #16to re-execute the choose question. As proven within the Question Editor within the following screenshot, now the identical question completed in 244 milliseconds after updating the distribution type to key for tablesweb_salesandweb_returns.
- Within the Question profiler view, activate View streams and spot that Streams 5 now took essentially the most time. It took 8 milliseconds to complete, as in comparison with 13 milliseconds within the previous step.

- Within the Streams facet panel, below ID, choose 5 to drill down additional, then select the Hashjoin As the next screenshot reveals, after altering the distribution type to key for each
web_salesandweb_returntables, not one of the tables should be redistributed on the question runtime, leading to optimized efficiency.
Issues
Contemplate the next particulars whereas utilizing Question profiler:
- Question profiler shows data returned by the SYS_QUERY_HISTORY, SYS_QUERY_EXPLAIN, SYS_QUERY_DETAIL, and SYS_CHILD_QUERY_TEXT views.
- Question profiler solely shows question data for queries which have lately run on the database. If a question completes utilizing a prepopulated resultset cache, Question profiler gained’t have details about it as a result of Amazon Redshift doesn’t generate a question plan for such queries.
- Queries run by Question profiler to return the question data run on the identical knowledge warehouse because the user-defined queries.
Clear Up
To keep away from surprising prices, full the next motion to delete the sources you created:
Drop all of the tables within the sample_data_dev below tpcds schema.
Conclusion
On this submit, we mentioned how you can use Amazon Redshift Question profiler to watch and troubleshoot long-running queries. We demonstrated a step-by-step strategy to research question efficiency by analyzing the question execution plan and statistics and figuring out the basis reason for question slowness. Do this function in your setting and share your suggestions with us.
In regards to the Authors
Raks Khare is a Senior Analytics Specialist Options Architect at AWS based mostly out of Pennsylvania. He helps prospects throughout various industries and areas architect knowledge analytics options at scale on the AWS platform. Exterior of labor, he likes exploring new journey and meals locations and spending high quality time along with his household.
Blessing Bamiduro is a part of the Amazon Redshift Product Administration staff. She works with prospects to assist discover using Amazon Redshift ML of their knowledge warehouse. In her spare time, Blessing loves travels and adventures.
Ekta Ahuja is an Amazon Redshift Specialist Options Architect at AWS. She is enthusiastic about serving to prospects construct scalable and strong knowledge and analytics options. Earlier than AWS, she labored in a number of totally different knowledge engineering and analytics roles. Exterior of labor, she enjoys panorama pictures, touring, and board video games.



