black and white bed linen

Time & Dates Queries on a Business PostGreSQL Database with SQLAlchemy Core

PostGreSQL + SQLAlchemy Core Hands-on Practice

Tech Stack

  • Python 3,

  • PostGreSQL, SQLAlchemy (Core),

  • VsCode, Jupyter, GitHub

Project

  • Execution of basic time and dates based SQL queries and best practices demonstrating practical professional usage.

Benefits

Querying time and dates is essential for analyzing data, particularly from PostgreSQL and similar databases.

Goals

Enhancing my SQL and SQLAlchemy skills.

I spent 11 hours divided this way and in this order:

  • 6 hrs:

    • coding the queries

  • 4 hrs:

    • debugging and learning

  • 1 hrs:

    • reviewing code, reviewing the notebook outline, cleaning up

Time and Date Queries in SQLAlchemy: Part 3

Welcome to the final installment of my three-part series on SQLAlchemy. In this post, we dive deep into time and date queries—a crucial aspect of data analysis. Whether you’re creating time series visualizations or generating insightful line plots, time is almost always a variable in your datasets. Consider a typical analysis: tracking daily sales figures to identify trends. Without the proper handling of time and date fields, the entire analysis could lead to misleading insights.

You can find the 2 previous part 1 and part 2 articles for more SQLALchemy demonstrations.
If you want to the complete notebook with all the code on which this article is based, click here.

Domain Knowledge Meets Data Analysis

In this series, we’ve been using an old but gold 2007 DVD rental database. Although outdated, the data still serves as a perfect playground to hone our skills—reminding us that having even a basic business acumen or domain knowledge is invaluable.

For example, knowing that a typical DVD rental lasts 7 days can help you differentiate between a recorded rental duration and an ongoing rental. When a rental is still in progress, its duration is calculated from the rental start date to today. This is essential for dynamic reporting and real-time analysis.

Here’s a snippet from the notebook that illustrates calculating the ongoing rental duration:

This code snippet shows how calculating the age of a rental from today can help you distinguish between completed and ongoing rentals.

Leveraging extract() and age() Functions

Two functions that stand out when working with time and date data in SQLAlchemy are extract() and age().

  • extract() allows you to pull out specific components (year, month, week, day, etc.) from a date or timestamp. This is particularly useful when you want to group your data by these time components.

  • age() computes the difference between two dates. It’s invaluable for creating intervals, such as determining how long ago an event occurred or how much time remains in an ongoing rental.

I already showed it in the code snipped above but here's another example where I use age() to compute the time difference between today and a rental date, which is essential for time-based analysis:

The Role of cast() in Time Queries

The cast() function in SQLAlchemy is crucial when you need to convert data types. It’s particularly useful when working with timestamps, as it lets you strip off the time component and work solely with dates.

However, caution is needed: casting a timestamp to a date can inadvertently affect functions like count(). For instance, if you perform a group_by on a timestamp without casting, each unique timestamp (even those on the same day) will be treated as distinct, often returning a count of 1 for each group.

Here’s an example demonstrating the issue and its resolution:

Without Proper Casting (Returns 1 Count Per Group):

Correct Approach (Cast to Date to Group by Day):

In this example, casting ensures that all timestamps on the same day are grouped together, giving you an accurate count of rentals per day.

Understanding Function Scope and group_by

When writing queries that involve both aggregate functions and non-aggregated columns, it’s essential to manage the scope of each function carefully. Every column in your SELECT that isn’t an aggregate must be included in your GROUP BY clause. Failing to do so results in errors, as the database won’t know how to handle the mix of aggregated and raw data.

A notable challenge arises when using expressions like adding an interval to a date. For example, in the code below I properly selected that interval code because it using a date column (pppd) from CTE that I created ahead (four_time_col):

Here, the expression (four_time_col.c.pppd + text("INTERVAL '7 days'")) is where it should be. A mistake would have been to use that line in the CTE. That would have messed up the grouping of dates and so the scope of the function count.

Key Tip: Always plan your query by determining which expressions are aggregated and which are not. Then, ensure that every non-aggregated column (or expression) is explicitly listed in your GROUP BY clause. This proactive approach avoids unexpected errors during query execution.

A Data Specialist’s Toolkit

In practical scenarios, data specialists often need to transfer data between systems. This task isn’t overly complex—it mainly requires familiarity with the source and target database schemas and an awareness of any dialect differences. SQLAlchemy greatly simplifies this process by providing a uniform interface regardless of the underlying SQL dialect.

For instance, in the notebook I demonstrated how to insert processed data into a PostgreSQL table after fetching and transforming data from another source. Although the operation was straightforward, it underscored an important point: understanding the schema and minor syntax differences between databases is key. SQLAlchemy abstracts much of the complexity, letting you focus on the logic of your data transformation rather than on dialect-specific quirks.

Wrapping Up

Time and date queries are central to many data analysis tasks. In this notebook series, I’ve explored various techniques—using functions like extract() and age(), understanding the necessity of casting timestamps to dates, and managing the scope of functions with proper group_by usage. These fundamentals not only enhance the accuracy of your analyses but also contribute to cleaner, more maintainable code.

I enjoyed delving into this concrete aspect of time and date querying in SQLAlchemy, and I hope the insights shared here prove useful in your own projects. Whether you’re analyzing rental durations or tracking website activity, mastering these techniques is essential for any data analyst.

Happy querying!

💡 If you want to explore my full SQLAlchemy journey, check out my GitHub repository.