top of page

10 SQL Concepts That I Found Useful to Analyse Data

Updated: Aug 1

SQL is an incredibly powerful tool for data analysis, and mastering its advanced concepts can greatly enhance your ability to derive insights from data.


sql queries


Here are 10 SQL concepts that I have found particularly useful in my data analysis journey:


1. Common Table Expressions (CTEs)

Common Table Expressions (CTEs) simplify complex queries and improve readability. Defined using the `WITH` clause, CTEs create temporary result sets that can be referenced within the main query.

WITH Sales_CTE AS (
    SELECT ProductID, SUM(Quantity) AS TotalQuantity
    FROM Sales
    GROUP BY ProductID
)
SELECT * FROM Sales_CTE;

CTEs help break down complicated queries into more manageable parts.


2. Recursive CTEs

Recursive CTEs extend the functionality of CTEs, allowing for recursive operations such as hierarchical data processing. They repeatedly execute a recursive query until a termination condition is met.


WITH RECURSIVE EmployeeHierarchy AS (
    SELECT EmployeeID, ManagerID, 1 AS Level
    FROM Employees
    WHERE ManagerID IS NULL
    UNION ALL
    SELECT e.EmployeeID, e.ManagerID, eh.Level + 1
    FROM Employees e
    INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
SELECT * FROM EmployeeHierarchy;

Recursive CTEs are useful for building structures like organizational hierarchies.


3. Temporary Functions

Temporary functions, or user-defined functions (UDFs), encapsulate logic into reusable components, simplifying queries and promoting code reuse.


CREATE FUNCTION CalculateDiscount(price DECIMAL, discountRate DECIMAL)
RETURNS DECIMAL AS
BEGIN
    RETURN price - (price * discountRate);
END;

Using UDFs ensures consistent calculations across your queries.


4. Pivoting Data with CASE WHEN

Pivoting data transforms rows into columns, making it easier to analyze specific metrics. This is achieved using `CASE WHEN` statements.


SELECT 
    ProductID,
    SUM(CASE WHEN Year = 2021 THEN Quantity ELSE 0 END) AS Quantity_2021,
    SUM(CASE WHEN Year = 2022 THEN Quantity ELSE 0 END) AS Quantity_2022
FROM Sales
GROUP BY ProductID;

Pivoting data is particularly useful for comparing metrics across different time periods.


5. EXCEPT versus NOT IN

Both `EXCEPT` and `NOT IN` filter out unwanted rows, but they differ in performance. `EXCEPT` returns distinct rows from the first query that are not present in the second query.


SELECT EmployeeID FROM Employees
EXCEPT
SELECT EmployeeID FROM RetiredEmployees;

`NOT IN` checks for the absence of values in a subquery but can be less efficient for large datasets.


SELECT EmployeeID FROM Employees
WHERE EmployeeID NOT IN (SELECT EmployeeID FROM RetiredEmployees);

Choosing the right method can optimize query performance.


6. Self Joins

Self joins allow you to join a table with itself to compare rows within the same table. This technique is useful for hierarchical data or comparing records.


SELECT e1.EmployeeID, e1.Name, e2.Name AS ManagerName
FROM Employees e1
INNER JOIN Employees e2 ON e1.ManagerID = e2.EmployeeID;

Self joins enable complex comparisons within a single table.


7. Rank versus Dense Rank versus Row Number

These window functions assign ranks to rows based on specific criteria:

- `ROW_NUMBER()` assigns a unique number to each row.

- `RANK()` assigns the same rank to identical values, leaving gaps in the ranking sequence.

- `DENSE_RANK()` assigns the same rank to identical values without gaps.


SELECT Name, Salary,
       ROW_NUMBER() OVER (ORDER BY Salary DESC) AS RowNum,
       RANK() OVER (ORDER BY Salary DESC) AS Rank,
       DENSE_RANK() OVER (ORDER BY Salary DESC) AS DenseRank
FROM Employees;

Understanding these differences helps in ranking data accurately.


8. Calculating Delta Values

Delta values measure the change between consecutive rows, useful for tracking trends over time.


SELECT 
    Date, 
    Sales,
    LAG(Sales, 1) OVER (ORDER BY Date) AS PreviousSales,
    Sales - LAG(Sales, 1) OVER (ORDER BY Date) AS SalesDelta
FROM DailySales;

Calculating delta values helps identify changes and trends in data.


9. Calculating Running Totals

Running totals provide cumulative sums across rows, aiding in tracking aggregated metrics over time.


SELECT 
    Date, 
    Sales,
    SUM(Sales) OVER (ORDER BY Date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningTotal
FROM DailySales;

Running totals are useful for cumulative analysis.


10. Date-Time Manipulation

Date-time manipulation is crucial for time-series analysis. SQL offers functions to handle date and time operations.


SELECT 
    OrderID, 
    OrderDate,
    DATEADD(day, 7, OrderDate) AS WeekLater,
    DATEDIFF(day, OrderDate, GETDATE()) AS DaysSinceOrder
FROM Orders;

Manipulating dates and times enables precise time-based analyses.


Mastering these SQL concepts can greatly enhance your data analysis skills, making you more efficient and effective in extracting insights from your data.

Kommentarer


Kommentering har blitt slått av.
bottom of page