Simplify your question efficiency diagnostics in Amazon Redshift with Question profiler

October 23, 2024

42

Amazon Redshift is a quick, scalable, safe, and totally managed cloud knowledge warehouse that allows you to analyze your knowledge at scale. Amazon Redshift Serverless allows you to entry and analyze knowledge with out the same old configurations of a provisioned knowledge warehouse. Assets are routinely provisioned and knowledge warehouse capability is intelligently scaled to ship quick efficiency for even essentially the most demanding and unpredictable workloads. If you happen to desire to handle your Amazon Redshift sources manually, you’ll be able to create provisioned clusters in your knowledge querying wants. For extra data, seek advice from Amazon Redshift clusters.

Amazon Redshift offers efficiency metrics and knowledge so you’ll be able to monitor the well being and efficiency of your provisioned clusters, serverless workgroups, and databases. The efficiency knowledge you should utilize on the Amazon Redshift console falls into two classes:

Amazon CloudWatch metrics – Helps you monitor the bodily facets of your cluster or serverless, equivalent to useful resource utilization, latency, and throughput.
Question and cargo efficiency knowledge – Helps you monitor database exercise, examine and diagnose question efficiency issues.

Amazon Redshift has launched a brand new function referred to as the Question profiler. The Question profiler is a graphical software that helps customers analyze the elements and efficiency of a question. This function is a part of the Amazon Redshift console and offers a visible and graphical illustration of the question’s run order, execution plan, and numerous statistics. The Question profiler makes it simpler for customers to grasp and troubleshoot their queries.

On this submit, we cowl two widespread use instances for troubleshooting question efficiency. We present you step-by-step how you can analyze and troubleshoot long-running queries utilizing the Question profiler.

Overview

For Amazon Redshift Serverless, the Question profiler may be accessed by going to the Serverless console. Select Question and database monitoring, choose a question, after which navigate to the Question plan tab. If a question plan is offered, you’ll observe an inventory of kid queries. Select a question to view it in Question profiler.

For Amazon Redshift provisioned, the Question profiler may be accessed by going to the provisioned clusters dashboard. Select Question and masses, and select a question. Navigate to the Question plan tab. If a question plan is offered, you’ll observe an inventory of kid queries. Select a question to view it in Question profiler.

Stipulations

You should utilize the next pattern AWS Identification and Entry Administration (IAM) coverage to configure your IAM consumer or function with minimal privileges to entry Question profiler from the AWS console. In case your IAM consumer or function already has entry to Question and masses part of Redshift provisioned cluster dashboard or Question and database monitoring part of Redshift serverless dashboard, then no further permissions are wanted:

{
    "Model": "2012-10-17",
    "Assertion": [
        {
            "Effect": "Allow",
            "Action": [
                "redshift:DescribeClusters",
                "redshift-serverless:ListNamespaces",
                "redshift-serverless:ListWorkgroups",
                "redshift-data:ExecuteStatement",
                "redshift-data:DescribeStatement",
                "redshift-data:GetStatementResult"
            ],
            "Useful resource": [
                "arn:aws:redshift-serverless:",
                "arn:aws:redshift-serverless:",
                "arn:aws:redshift:"
            ]
        }
    ]
}

You’ll be able to select to make use of Question profiler in your account with an current Amazon Redshift knowledge warehouse and queries. Nevertheless, if you need to implement this demo in your current Amazon Redshift knowledge warehouse, obtain Redshift question editor v2 pocket book, Redshift Question profiler demo, and seek advice from the Information Loading part later on this submit.
It’s essential to hook up with the cluster utilizing database credentials and grant the sys:operator or sys:monitor function to the database consumer to view queries run by customers.

Information loading

Amazon Redshift Question Editor v2 comes with pattern knowledge that may be loaded right into a pattern database and corresponding schema. To check Question profiler in opposition to the pattern knowledge, load the tpcds pattern knowledge and run queries.

To load the tpcds pattern knowledge, launch Redshift question editor v2 and develop the database sample_data_dev.
Select the icon related to the tpcds.
The question editor v2 then masses the info right into a schema tpcds within the database sample_data_dev.

The next screenshot reveals these steps.

Confirm the info by operating the next pattern question, as proven within the following screenshot.

choose depend(*) from sample_data_dev.tpcds.buyer;

Use instances

On this submit, we describe two widespread makes use of instances round question efficiency and how you can use Question profiler to troubleshoot the efficiency points:

Nested loop joins – This be part of kind is the slowest of the potential be part of varieties. Nested loop joins are the cross-joins and not using a be part of situation that outcome within the Cartesian product of two tables.
Suboptimal knowledge distribution – If knowledge distribution is suboptimal, you would possibly discover a big broadcast or redistribution of information throughout compute nodes when two massive tables are joined collectively.

Use case 1: Nested loop joins

To troubleshoot efficiency points with nest loop joins utilizing Question profiler, observe these steps:

Import pocket book downloaded beforehand in conditions part of the weblog into Redshift question editor v2.
Set the context of database to sample_data_dev in Question Editor v2, as proven within the following screenshot.
Run cell #3 from demo pocket book to diagnose a question efficiency concern associated to nested loop joins.

The question takes round 12 seconds to run, as proven within the Question Editor v2 outcomes panel within the following screenshot.

Run cell #5 to seize the question id from the SYS_QUERY_HISTORY system view filtering based mostly on the question label you set within the previous step.
On the Amazon Redshift console, within the navigation pane, choose Question and masses and select the cluster title the place the question was initially executed, as proven within the following screenshot.
This can open the brand new Question profiler. Underneath the Question historical past part, select Hook up with database.After profitable connection to the database, you’ll observe the Standing exhibiting as Linked and displaying the question historical past, as proven within the following screenshot.
Yow will discover your queries both by Question ID or Course of ID. Enter the Question ID captured within the previous step to filter the long-running question for additional evaluation and select the corresponding Question ID, as proven within the following screenshot.
Underneath the Question plan part, select Little one question 1, as proven within the following screenshot. If there are a number of little one queries, you’ll have to examine every one for efficiency points.
This can open the question plan in a tree view together with further metrics on the facet panel. This lets you rapidly analyze the question streams, segments and steps. For extra details about streams, segments, and steps, seek advice from Question planning and execution workflow within the Amazon Redshift Database Developer Information.
Activate View streams and, within the Streams facet panel, examine and determine which stream has the very best execution time. On this case, Streams ID 5 is the place the question spends nearly all of time, as proven within the following screenshot
Within the Streams facet panel, below ID, choose 5 to deal with Stream 5 for additional evaluation. Stream 5 reveals a step of Nestloop, as proven within the following screenshot.
Select the Nestloop step to additional analyze. The facet panel will change with step particulars and extra metrics in regards to the nested loop be part of.
By taking a look at Step particulars – nestloop, we will examine the Enter rows and evaluate that with the Output rows, as proven within the following screenshot. On this case, because of the cross-joining with the Store_returns desk, 287,514 enter rows explodes to 950,233,770 rows, thus inflicting our question to run slower.
Repair the question by introducing a be part of situation between the store_sales and store_returns. Run cell #7 from Question editor v2 demo pocket book.The re-written question runs in simply 307 milliseconds.

Use case 2: Suboptimal knowledge distribution

To reveal suboptimal knowledge distribution, change the distribution type of tables web_sales and web_returns to EVEN by operating cell #10 of Question editor v2 demo pocket book.

Run cell #12. The question takes 409 milliseconds to run, as proven by the elapsed time within the following screenshot of the Question editor v2.
Observe steps 3–10 from use case 1 to find the query_id and to open the Question profiler view for the previous question.
On the Question profiler web page for the previous question, activate View streams. Within the Streams facet panel, examine and determine which stream has the very best execution time. On this case, Stream ID 6 is the place the question spends a majority of the time, as proven within the following screenshot.
Underneath ID, choose 6 from the Streams facet panel for additional evaluation.

Stream 6 reveals a step of hash be part of, which includes a hash be part of of two tables which might be each redistributed. This may be inferred from Hash Proper Be a part of DS_DIST_BOTH below Clarify plan node data within the following screenshot. Often, these redistributions happen as a result of the tables aren’t joined on their distribution keys, or they don’t have the proper distribution type. Within the case of huge tables, these redistributions can result in important efficiency degradation and, therefore, you will need to determine and repair such steps to optimize question efficiency.

Repair this suboptimal knowledge distribution sample by selecting the suitable distribution keys on the tables concerned: web_sales and web_returns. To vary the distribution kinds, run cell #14 of demo pocket book to change desk instructions.
After the previous instructions end operating, run cell #16 to re-execute the choose question. As proven within the Question Editor within the following screenshot, now the identical question completed in 244 milliseconds after updating the distribution type to key for tables web_sales and web_returns.
Within the Question profiler view, activate View streams and spot that Streams 5 now took essentially the most time. It took 8 milliseconds to complete, as in comparison with 13 milliseconds within the previous step.
Within the Streams facet panel, below ID, choose 5 to drill down additional, then select the Hashjoin As the next screenshot reveals, after altering the distribution type to key for each web_sales and web_return tables, not one of the tables should be redistributed on the question runtime, leading to optimized efficiency.

Issues

Contemplate the next particulars whereas utilizing Question profiler:

Question profiler shows data returned by the SYS_QUERY_HISTORY, SYS_QUERY_EXPLAIN, SYS_QUERY_DETAIL, and SYS_CHILD_QUERY_TEXT views.
Question profiler solely shows question data for queries which have lately run on the database. If a question completes utilizing a prepopulated resultset cache, Question profiler gained’t have details about it as a result of Amazon Redshift doesn’t generate a question plan for such queries.
Queries run by Question profiler to return the question data run on the identical knowledge warehouse because the user-defined queries.

Clear Up

To keep away from surprising prices, full the next motion to delete the sources you created:

Drop all of the tables within the sample_data_dev below tpcds schema.

Conclusion

On this submit, we mentioned how you can use Amazon Redshift Question profiler to watch and troubleshoot long-running queries. We demonstrated a step-by-step strategy to research question efficiency by analyzing the question execution plan and statistics and figuring out the basis reason for question slowness. Do this function in your setting and share your suggestions with us.

In regards to the Authors

Raks Khare is a Senior Analytics Specialist Options Architect at AWS based mostly out of Pennsylvania. He helps prospects throughout various industries and areas architect knowledge analytics options at scale on the AWS platform. Exterior of labor, he likes exploring new journey and meals locations and spending high quality time along with his household.

Blessing Bamiduro is a part of the Amazon Redshift Product Administration staff. She works with prospects to assist discover using Amazon Redshift ML of their knowledge warehouse. In her spare time, Blessing loves travels and adventures.

Ekta Ahuja is an Amazon Redshift Specialist Options Architect at AWS. She is enthusiastic about serving to prospects construct scalable and strong knowledge and analytics options. Earlier than AWS, she labored in a number of totally different knowledge engineering and analytics roles. Exterior of labor, she enjoys panorama pictures, touring, and board video games.

Simplify your question efficiency diagnostics in Amazon Redshift with Question profiler

Overview

Stipulations

Information loading

Use instances

Use case 1: Nested loop joins

Use case 2: Suboptimal knowledge distribution

Issues

Clear Up

Conclusion

In regards to the Authors

Related Articles

From Manufacturing facility Flooring to CyberSecurity: How Cisco Helped Me Construct My Dream Profession

Why CISOs Should Add Intent to the Equation

BONUS: Drone Life Information 019 – Heavy Carry drones, Dizi drone launch, Percepto approval for BVLOS flights, Airspace safety with drones

LEAVE A REPLY Cancel reply

Latest Articles

From Manufacturing facility Flooring to CyberSecurity: How Cisco Helped Me Construct My Dream Profession

Why CISOs Should Add Intent to the Equation

BONUS: Drone Life Information 019 – Heavy Carry drones, Dizi drone launch, Percepto approval for BVLOS flights, Airspace safety with drones

Scientists create ultra-low loss optical gadget that traps mild on a chip

ID@Xbox Developer Acceleration Program Celebrates Video games from Throughout the African Diaspora

ABOUT US