Efficiently query the DATE in DATETIME

The DATETIME data type is often misunderstood and used inefficiently. This article focuses on the date component of DATETIME, how it is handled internally and how it can be used effectively for querying. The DATETIME type is internally stored as two separate 4-byte integers: one of those integers stores the date portion, and the other the time. When the date portion has a value of 0, the date is 1900-01-01. Because the date is internally stored as an INT, casting and converting directly between the types is natural:

Note: Because we are focusing on the date portion, we can ignore the time components for the purposes of this article. All the times will be midnight, but I won’t reprint them below. Additionally, the techniques demonstrated within can be applied to SMALLDATETIME and DATETIME2 (SQL 2008).

SELECT CONVERT(DATETIME, 0)
--Result: 1900-01-01

--Store the date to a DATETIME variable first
DECLARE @Date DATETIME
SET @Date = '1900-01-01'
SELECT CAST(@Date AS INT)
--Result: 0

You can use cast and convert for either direction, there is no restriction. Note in the second example, I stored the date in a variable first, because there is no way to natively pass a DATETIME type into the query window; SQL Server must always convert it from a string. By taking the intermediary step of storing it into a variable, we ensure that the Database Engine truly understands the value as a DATETIME and not a VARCHAR. If you had attempted to convert the VARCHAR value directly, you would receive the following error:

SELECT CAST('1900-01-01' AS INT)
--Msg 245, Level 16, State 1, Line 1
--Conversion failed when converting the varchar value '1900-01-01' to data type int.

The trick works for any value valid in the DATETIME range (January 1, 1753, through December 31, 9999):

SELECT CONVERT(DATETIME, 2958463)
--Result: 9999-12-31

Try to go too far, however, and you’ll get an error:

SELECT CONVERT(DATETIME, 2958464)
--Msg 8115, Level 16, State 2, Line 1
--Arithmetic overflow error converting expression to data type datetime.

Manipulating Dates

Fortunately Microsoft has provided some very useful functions for manipulating the DATETIME data type, negating our need to perform complicated math.

DATEADD

According to MSDN, DATEADD returns a specified date with the specified number interval (signed integer) added to a specified datepart of that date. The function prototype looks something like this:

DATEADD (datepart, number, date)

Because any date can be referenced as a simple integer, adding dates becomes trivial with the DATEADD function. Let’s compare adding days with both as a DATETIME and as an INT:

SELECT DATEADD(day, 1, '1900-1-1')
--Result: 1900-01-02

And compare that with adding 1 date to 0

SELECT DATEADD(day, 1, 0)
--Result: 1900-01-02

Notice the result is the same. We can also add any other datepart. Want the start of 2009?

SELECT DATEADD(year, 109, 0)
--Result: 2009-01-01

DATEDIFF

DATEDIFF acts in an opposite capacity of DATEADD: it calculates the date or time difference between DATETIME values. The prototype looks like this:

DATEDIFF (datepart, startdate , enddate)

And an actual example:

SELECT DATEDIFF(year, '1900-01-01', '2009-01-01')
--Result: 109

As before, we can certainly specify any of the dates as a INT. Since we know that 1900 has a corresponding integer value of 0, let’s try that:

SELECT DATEDIFF(year, 0, '2009-01-01')
--Result: 109

We can take it a step further and compare two INT values (again, as long as they fall within the valid DATETIME range). Let’s find how many days are between 1900-01-01 and 9999-12-31:

SELECT DATEDIFF(year, 0, 2958463)
--Result: 8099

The important aspect to note is that DATEDIFF “returns the number of date and time boundaries crossed between two specified dates.” (MSDN) In plain language, that means SQL Server never rounds the time period, but rather calculates the time periods completed. For example, let’s check the number of years between Leap Day in 2000 and Christmas 2008:

SELECT DATEDIFF(year, '2000-02-29', '2008-12-25')
--Result: 8

Although the total time is nearly 8.9 years, only 8 actual year boundaries have been crossed. This principle is the basis for calculating beginnings of time periods. When we combine DATEDIFF with DATEADD, the combination will predictably give us the start of any time period. Let’s say we wanted to calculate the number of years since 1900-01-01:

--Assume GETDATE() returns 2009-02-12 22:23:00.000
SELECT DATEDIFF(year, 0, GETDATE())
--Result: 109

We could easily take this result and place it in a DATEADD and see that we get 1900 + 109 = 2009

SELECT DATEADD(year, 109, '1900-01-01')
--Result: 2009-01-01

Or, as before, let’s use the integer value for 1900-01-01:

SELECT DATEADD(year, 109, 0)
--Result: 2009-01-01

Since DATEDIFF returns an INT and DATEADD adds an INT to a DATETIME, we can combine the two and return a natural DATETIME in a single query:

--Assume GETDATE() returns 2009-02-12 22:23:00.000
SELECT DATEADD(year, DATEDIFF(year, 0, GETDATE()), 0)
--Result: 2009-01-01

To analyze what is actually occurring, let’s start with the DATEDIFF in the middle <DATEDIFF(year, 0, GETDATE()>. This calculates the number of year boundaries crossed between 1900 and 2009, which is 109. The outer portion <DATEADD(year, 109, 0)> adds that result of 109 to 1900 and gives us the start of the year 2009. This technique allows us to find the start of any date or time period based on a relative starting point. That starting point is commonly GETDATE(), but can be any valid DATETIME. If you want the beginning of the month, swap “year” for “month”. The same can be used for day, week, or any other datepart. This can help us greatly when querying a DATETIME field to fall in a range of values (i.e. “Select all rows created this year.”) What if we wanted the start of next year (2010)?

--Assume GETDATE() returns 2009-02-12 22:23:00.000
SELECT DATEADD(year, DATEDIFF(year, 0, GETDATE()) + 1, 0)
--Result: 2010-01-01

Notice the only change was a “+ 1″ next to the DATEDIFF calculation. Because we’re calculating the number of years since 1900 (109), we can adjust for years forward by adding, or years back by subtracting. By adding only 1, we jump forward a single year.

Query efficiency

Using DATEADD and DATEDIFF is much faster than string manipulation with CAST and CONVERT. Additionally, it allows us to check the bounds of a DATETIME column without using the YEAR, MONTH, or DAY function. Why is this good? Any modification applied to a column in the WHERE condition, negates the ability for SQL Server to utilize an index on that column. This is because the index contains only the actual values, not some calculation of them. Think of it this way: If you looked in the index of a book and wanted to know the pages of topics whose third letter is an “e” is, you would have to scan the entire article list, because the index is based on the first letter, not third. This is similar to how a YEAR function might affect an index on a DATETIME column. To get around this, rather than altering the column value in the condition, we alter our lookup condition. For example, instead of YEAR(DateColumn) = YEAR(GETDATE()), we can specify DateColumn >= @BeginningOfThisYear AND < @BeginningOfNextYear. Or, more specifically:

SELECT *
FROM TableName
WHERE DateColumn >= DATEADD(year, DATEDIFF(year, 0, GETDATE()), 0)
	AND DateColumn < DATEADD(year, DATEDIFF(year, 0, GETDATE()) + 1, 0)

In English, it reads something like this: “Select everything from TableName where DateColumn is at least the start of the year and also before next year”. Or, even simpler, “Show me where DateColumn is the current year”. This has a similar effect of YEAR(DateColumn), but will allow an index on DateColumn to be used. Obviously, this will not guarantee the index will be used – if one exists at all – but, it at least leaves the option available to the query optimizer.

Conclusion

It is very common to check for records falling in a specified range such as current year or last month, but many people resort to nasty functions forcing a table scan. With the above knowledge and tools, developers can more efficiently leverage the DATETIME data type and speed up slow-running queries.

3 thoughts on “Efficiently query the DATE in DATETIME

  1. It’s a shame that strict rules like that are ever implemented. When used properly, it’s a wonderful tool – especially with the extra date/time types introduced in SQL Server 2008.

    I saw a similar situation where the timestamps were stored in Julian dates.

  2. I love the datetime datatype, but alas we are banned from using it at work, we are forced to use the unix timestamp instead… Which I guess is faster… somewhat, but less human readable and needs to be converted almost every time it is used…

Leave a Reply

Your email address will not be published. Required fields are marked *

* Copy This Password *

* Type Or Paste Password Here *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>