SQL & Databases
In this part 4 of my DuckDB series I dive into the Melbourne Pedestrian dataset starting from scratch with some light preparations of the data, to then leverage the speed and ease of use of Python and DuckDB to get visualizations and insight:
Use of Relational API & JupySQL & Pandas
Typical EDA workflow based on DuckDB:
import, clean up, explore, visualize, analyse
mostly on time series
Plotly Express & Components for fine tuned visualizations
figure, facets, template, line & box plots, histogram
All code in a Github repo via a notebook


Continuing on DuckDB with this part 3 notebook, I show how DuckDB is offering a better SQL experience for data specialists compared to most standard SQL solutions out there by applying some best practices:
Better select: column(), exclude(), trailing comma, etc.
Scalar function chaining
Better insert: by name, insert or replace, positional joins
Recursive queries: hierarchical traversal
Macros
All code in a Github repo via a notebook


This time I finished 2 notebooks covering the basics of this increasingly popular Analytical DBMS: DuckDB.
part 1: What is DuckDB, Why use it, and When not to.
Installing it, setting it up for Python use, connections parameters, extensions
part 2: Focus on the Python and the Relational API:
Basic SQL queries with Python API & Relational Objects, Expression API, chaining methods, in-memory Vs persistent, writing to disk, prepared statement, user-defined functions, converting to Pandas, Polars DFs & PyArrow table.


In this 3rd part I practice and explore basic Time and Dates queries, and some basic business aspects to keep in mind, still using SQLAlchemy's Core approach and Python:
age, interval, extract, cast, to_char, CTEs & subqueries
case for columns with custom categories
insert of an external .csv file to PostgreSQL
All code in a Github repo via a notebook
Continuing after the previous practice project, we go to more advanced SQL techniques, still using SQLAlchemy's Core approach and Python:
Filtering with subqueries
good use of CTEs
Window Function: row_number()
All code in a Github repo via a notebook
Hands-on practice querying a 16,000+ entries DVD rental PostGreSQL database by using SQLAlchemy's Core approach and Python:
Multiple tables join, distinct, union, extract
agg_array for dates, generate_series, aliasing for subqueries
correlated subqueries, CTE
All code in a Github repo via a notebook
Hands-on practice creating a simple Library database and querying it through 15 tasks by using SQLAlchemy's Core approach and Python:
Use of foundational API: Core
SELECT, INSERT, JOINs, WHERE, SQL functions
Multiple JOINs & OUTER JOINs
All code in an integrated notebook
Introductory hands-on practice developing in Python the setup necessary to use SQLAlchemy for querying SQL statements:
foundational APIs: Core & ORM
SQLAlchemy database Metadata query language
Example of website user table
Feature: Table Reflection