
As the volume of data continues to grow, the need for qualified data professionals grows as well. Specifically, there is a growing need for professionals who are fluent in SQL beyond a beginner level.
So, Nathan Rosidi, founder of StrataScratch, and I collaborated to review the 10 most important and relevant intermediate to advanced SQL concepts.
With that said, here we go!
More in Data ScienceWhat Is Bootstrapping Statistics?
If you’ve ever wanted to query a query, that’s when common table expressions (CTEs) come into play. CTEs essentially create a temporary table.
Using CTEs is a great way to modularize and break down your code just like you would break an essay down into several paragraphs.
Consider the following query with a subquery in the where
clause:
This may not seem too difficult to understand, but what if there were many subqueries or subqueries within subqueries? This is where CTEs come into play.
Now, the where
clause is clearly filtering for names in Toronto. CTEs are useful both because you can break down your code into smaller chunks and because you can assign a variable name to each CTE (i.e., toronto_ppl
and avg_female_salary
).
CTEs also allow you to do more advanced techniques like creating recursive tables.
A recursive CTE is a CTE that references itself, just like a recursive function in Python. Recursive CTEs are especially useful when querying hierarchical data like organization charts, file systems, a graph of links between web pages, and so on.
A recursive CTE has three parts:
Here’s an example of a recursive CTE that gets the manager ID for each staff ID:
Knowing how to write temporary functions is important for several reasons:
Consider the following example:
Instead, you can leverage a temporary function to capture the CASE
clause.
With a temporary function, the query itself is much simpler and more readable. You can also reuse the seniority function. For more on temporary functions, check this out as well.
You’ll most likely see many questions that require the use of case when
statements, and that’s simply because it’s such a versatile concept. It allows you to write complex conditional statements if you want to allocate a certain value or class depending on other variables.
Less commonly known, however, is that it also allows you to pivot data. For example, if you have a month column, and you want to create an individual column for each month, you can use case when
statements to pivot the data.
Example Question: Write an SQL query to reformat the table so that there is a revenue column for each month.
Except
and not in
operate almost identically. They’re both used to compare the rows between two queries/tables. That being said, there are subtle nuances between the two that you should know.
First, except
filters out duplicates and returns distinct rows, unlike not in
.
Further, except
expects the same number of columns in both queries/tables, whereas not in
compares a single column from each query/table.
An SQL self-join joins a table with itself. You might think that such an action serves no purpose, but you’d be surprised at how common this is. In many real-life settings, data is stored in one large table rather than many smaller tables. In such cases, self-joins may be required to solve unique problems.
Let’s look at an example.
Example Question: Given the Employee table below, write a SQL query that finds employees who earn more than their managers.
For the above table, Joe is the only employee who earns more than his manager.
Master Data ScienceNlogn and Other Big O Notations Explained
Ranking rows and values is a common application. Here are a few examples in which companies frequently use ranking:
In SQL, there are several ways that you can assign a rank to a row, which we’ll dive into with an example. Consider the following query and results:ROW_NUMBER()
returns a unique number for each row starting at one. When there are ties (e.g., Bob versus Carrie), ROW_NUMBER()
arbitrarily assigns a number if a second criterion is not defined.RANK()
returns a unique number for each row starting at one, except for when there are ties, then RANK()
will assign the same number. A gap will follow a duplicate rank.DENSE_RANK()
is similar to RANK()
except that there are no gaps after a duplicate rank. Notice that with DENSE_RANK()
, Daniel is ranked third as opposed to fourth with RANK()
.
Another common application is comparing values from different periods. For example, what was the delta between this month and last month’s sales? Or what was the delta between this month and the same one last year?
When comparing values from different periods to calculate deltas, this is when LEAD()
and LAG()
come into play.
Here are some examples:
If you knew about ROW_NUMBER()
and LAG()
/LEAD()
, this probably won’t be much of a surprise to you. But if you didn’t, this is probably one of the most useful window functions, especially when you want to visualize growth.
Using a window function with SUM()
, we can calculate a running total. See the example below:
You should definitely expect some sort of SQL questions that involve date-time data. For example, you may be required to group data by months or convert a variable format from DD-MM-YYYY to simply the month.
Some functions you should know are the following:
Example Question: Given a Weather table, write a SQL query to find all date IDs with higher temperatures compared to previous dates.
More in Data ScienceSorting Algorithms: Slowest to Fastest
And that’s all! I hope that this helps you in your interview prep — I’m sure that if you know these 10 concepts inside-out, you’ll do great when it comes to most SQL problems out there. I wish you the best in your learning endeavors!
Built In’s expert contributor network publishes thoughtful, solutions-oriented stories written by innovative tech professionals. It is the tech industry’s definitive destination for sharing compelling, first-person accounts of problem-solving on the road to innovation.