Partly two of the SQL Scripting announcement weblog collection, we’ll look at the executive activity we mentioned in half one—find out how to apply a case-insensitive rule to each STRING column in a desk. We’ll stroll by means of that instance step-by-step, clarify the options used, and increase it past a single desk to cowl a complete schema.
You may as well observe alongside on this pocket book.
Altering the collation of all textual content fields in all tables in a schema
Databricks helps a variety of language-aware, case-insensitive, and accent-insensitive collations. It is easy to make use of this characteristic for brand spanking new tables and columns. However what if in case you have an current system utilizing higher() or decrease() in predicates in every single place and need to choose up the efficiency enhancements related to a local case-insensitive collation whereas simplifying your queries? That may require some programming; now you are able to do all of it in SQL.
Let’s use the next take a look at schema:
The order is predicated on the ASCII codepoints, the place all uppercase letters precede all lowercase letters. Are you able to repair this with out including higher() or decrease()?
Dynamic SQL statements and setting variables
Our first step is to inform the desk to alter its default collation for newly added columns. You may feed your native variables with parameter markers, which the pocket book will robotically detect and add widgets. You may as well use EXECUTE IMMEDIATE to run a dynamically composed ALTER TABLE assertion.
Each SQL script consists of a BEGIN .. END (compound) assertion. Native variables are outlined first inside a compound assertion, adopted by the logic.
That is all only a set of linear statements. Thus far, you may write all this with SQL Session variables with out the compound assertion. You additionally haven’t achieved a lot. In spite of everything, you wished to alter the collation for current columns. To do that, it’s good to:
- Uncover all current string columns within the desk
- Change the collation for every column
Briefly, it’s good to loop over the INFORMATION_SCHEMA.COLUMNS desk.
Loops
SQL Scripting provides 4 methods of looping and methods to manage loop iterations.
- LOOP … END LOOP;
This can be a “perpetually” loop.
This loop will proceed till an exception or an specific ITERATE or LEAVE command breaks out of the loop.
We’ll focus on exception dealing with later and level to the ITERATE and LEAVE documentation explaining find out how to management loops. - WHILE predicate DO … END WHILE;
This loop shall be entered and re-entered so long as the predicate expression evaluates to true or the loop is damaged out by an exception, ITERATE or LEAVE. - REPEAT … UNTIL predicate END REPEAT;
Not like WHILE, this loop is entered not less than as soon as and re-executes till the predicate expression evaluates to false or the loop is damaged by an exception, LEAVE, or ITERATE command. - FOR question DO …. END FOR;
This loop executes as soon as per row the question returns until it’s left early with an exception, LEAVE, or ITERATE assertion.
Now, apply the FOR loop to our collation script. The question will get the column names of all string columns of the desk. The loop physique alters every column collation in flip:
Let’s confirm that the desk has been correctly up to date:
Thus far, so good. Our code is functionally full, however it’s best to inform Delta to research the columns you modified to profit from file skipping. You do not need to do that per column. However collect all of them collectively and do the work provided that there was, in reality, a string column for which the collation was altered. Choices, choices ….
Conditional logic
SQL Scripting provides 3 ways to carry out conditional execution of SQL statements.
- If-then-else logic. The syntax for that is simple:
IF predicate THEN … ELSEIF predicate THEN … ELSE …. END IF;
Naturally, you possibly can have any variety of non-obligatory ELSEIF blocks, and the ultimate ELSE can be non-obligatory. - A easy CASE assertion
This assertion is the SQL Scripting model of the straightforward case expression.
CASE expression WHEN possibility THEN … ELSE … END CASE;
A single execution of an expression is in comparison with a number of choices, and the primary match decides which set of SQL statements must be executed. If none match, the non-obligatory ELSE block shall be executed. - A searched CASE assertion
This assertion is the SQL Scripting model of the searched case expression.
CASE WHEN predicate THEN …. ELSE … END CASE;
The THEN block is executed for the primary of any predicates that consider to true. If none match, the non-obligatory ELSE block is executed.
For our collation script, a easy IF THEN END IF will suffice. You additionally want to gather the set of columns to use ANALYZE to and a few higher-order perform magic to provide the column record:
Nesting
What you have got written to date works for particular person tables. What if you wish to function on all tables in a schema? SQL Scripting is absolutely composable. You may nest compound statements, conditional statements, and loops inside different SQL scripting statements.
So what you’ll do right here is twofold:
- Add an outer FOR loop to search out all tables inside a schema utilizing INFORMATION_SCHEMA.TABLES. As a part of this, it’s good to exchange the references to the desk identify variable with references to the outcomes of the FOR loop question.
- Add a nested compound to maneuver the column record variable down into the outer FOR loop. You can not declare a variable instantly within the FOR loop physique; it doesn’t add a brand new scope. That is primarily a choice associated to coding type, however you should have a extra critical cause for a brand new scope..
This error is smart. You have got a number of methods to proceed:
- Filter out unsupported desk varieties, akin to views, within the data schema question. The issue is that there are quite a few desk varieties, and new ones are often added.
- Deal with views. That is an ideal thought. Let’s name that your homework project.
- Tolerating the error situation
Exception dealing with
A key functionality of SQL Scripting is the power to intercept and deal with exceptions. Situation handlers are outlined within the declaration part of a compound assertion, they usually apply to any assertion inside that compound, together with nested statements. You may deal with particular error circumstances by identify, particular SQLSTATEs dealing with a number of error circumstances, or all error circumstances. Throughout the physique of the situation handler, you should use the GET DIAGNOSTICS assertion to retrieve details about the exception being dealt with and execute any SQL scripting you deem acceptable, akin to recording the error in a log or operating an alternate logic to the one which failed. You may then SIGNAL a brand new error situation, RESIGNAL the unique situation, or just exit the compound assertion the place the handler is outlined and proceed with the next assertion.
In our script, you need to skip any assertion for which the ALTER TABLE DEFAULT COLLATION assertion didn’t apply and log the item’s identify.
Above, you have got developed an administrative script purely in SQL. You may as well write ELT scripts and switch them into Jobs. SQL Scripting is a very highly effective software it’s best to exploit.
What to do subsequent
Whether or not you might be an current Databricks person or migrating from one other product, SQL Scripting is a functionality it’s best to use. SQL Scripting follows the ANSI commonplace and is absolutely suitable with OSS Apache Spark™. SQL Scripting is described intimately in SQL Scripting | Databricks Documentation.
You may as well use this pocket book to see for your self.
