SQL & Databases

In this part 4 of my DuckDB series I dive into the Melbourne Pedestrian dataset starting from scratch with some light preparations of the data, to then leverage the speed and ease of use of Python and DuckDB to get visualizations and insight:

  • Use of Relational API & JupySQL & Pandas

  • Typical EDA workflow based on DuckDB:

    • import, clean up, explore, visualize, analyse

    • mostly on time series

  • Plotly Express & Components for fine tuned visualizations

    • figure, facets, template, line & box plots, histogram

  • All code in a Github repo via a notebook

Continuing on DuckDB with this part 3 notebook, I show how DuckDB is offering a better SQL experience for data specialists compared to most standard SQL solutions out there by applying some best practices:

  • Better select: column(), exclude(), trailing comma, etc.

  • Scalar function chaining

  • Better insert: by name, insert or replace, positional joins

  • Recursive queries: hierarchical traversal

  • Macros

  • All code in a Github repo via a notebook

This time I finished 2 notebooks covering the basics of this increasingly popular Analytical DBMS: DuckDB.

  • part 1: What is DuckDB, Why use it, and When not to.

    Installing it, setting it up for Python use, connections parameters, extensions

  • part 2: Focus on the Python and the Relational API:

    Basic SQL queries with Python API & Relational Objects, Expression API, chaining methods, in-memory Vs persistent, writing to disk, prepared statement, user-defined functions, converting to Pandas, Polars DFs & PyArrow table.

In this 3rd part I practice and explore basic Time and Dates queries, and some basic business aspects to keep in mind, still using SQLAlchemy's Core approach and Python:

  • age, interval, extract, cast, to_char, CTEs & subqueries

  • case for columns with custom categories

  • insert of an external .csv file to PostgreSQL

  • All code in a Github repo via a notebook

Continuing after the previous practice project, we go to more advanced SQL techniques, still using SQLAlchemy's Core approach and Python:

  • Filtering with subqueries

  • good use of CTEs

  • Window Function: row_number()

  • All code in a Github repo via a notebook

Hands-on practice querying a 16,000+ entries DVD rental PostGreSQL database by using SQLAlchemy's Core approach and Python:

  • Multiple tables join, distinct, union, extract

  • agg_array for dates, generate_series, aliasing for subqueries

  • correlated subqueries, CTE

  • All code in a Github repo via a notebook

Hands-on practice creating a simple Library database and querying it through 15 tasks by using SQLAlchemy's Core approach and Python:

  • Use of foundational API: Core

  • SELECT, INSERT, JOINs, WHERE, SQL functions

  • Multiple JOINs & OUTER JOINs

  • All code in an integrated notebook

Introductory hands-on practice developing in Python the setup necessary to use SQLAlchemy for querying SQL statements:

  • foundational APIs: Core & ORM

  • SQLAlchemy database Metadata query language

  • Example of website user table

  • Feature: Table Reflection