Flatten Heirarchies in SQL Server with Common Table Expressions

In: SQL Server


12 May 2009

Common Table Expressions were a new feature added to SQL Server 2005 and provide an efficient way to recursively query relationships stored in a normalized table. We’re going to build on that essential functionality to flatten a typical corporate structure so that all children, grand children, great grand children, etc. roll up into a single, flattened parent, regardless of depth. To graphically visualize this, take a look at the actual relationship we’ll be querying against:

Actual employee structure to be flattened

Actual employee structure to be flattened

This structure will act as our example for the Development Department. The Department Head (Employee 0) has asked for a report of all employees within each team of the development group. Notice that different groups have varying levels of depth. The Database team only has a Manager with two direct reports. The UI team is a single person. The Middle Tier group is much larger, with a Manager having two direct reports, and one of those employees having several employees beneath him. The requested report should group all employees, flattening the relationship to a single Manager, but excluding himself (since that would mean every employee would simply roll up to him). This “Desired Relationship” is represented below:

The desired result of the query

The desired result of the query

To recreate this scenario, I’m providing some code for both the setup and query. If you’re not familiar with Common Table Expressions, I would suggest that you familiarize yourself with the SQL Server BOL entry. The main concept you should understand to be able to adapt this to your own data is that for a CTE to act recursively, you need both an anchor and recursive query. I’ll explain their parts later.

First, let’s instantiate the tables and populate them with data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
USE tempdb
go
 
IF OBJECT_ID('dbo.Employees') IS not null
	DROP TABLE dbo.Employees
go
 
CREATE TABLE dbo.Employees
(
	  EmployeeID INT PRIMARY KEY
	, EmployeeName VARCHAR(50) not null
	, ManagerEmployeeID INT null
)
go
 
INSERT INTO dbo.Employees (EmployeeID, EmployeeName, ManagerEmployeeID)
SELECT 0, 'Employee 0', null
UNION all
SELECT 1, 'Employee 1', 0
UNION all
SELECT 2, 'Employee 2', 0
UNION all
SELECT 3, 'Employee 3', 0
UNION all
SELECT 4, 'Employee 1.1', 1
UNION all
SELECT 5, 'Employee 1.2', 1
UNION all
SELECT 6, 'Employee 3.1', 3
UNION all
SELECT 7, 'Employee 3.2', 3
UNION all
SELECT 8, 'Employee 3.1.1', 6
UNION all
SELECT 9, 'Employee 3.1.1.1', 8
go

This script will simply create the example employees table and populate it with data to relationally represent what is displayed in the first picture. Here are what the various lines accomplish:

  • 1-2: Switch the database context to tempdb. Since the contents of tempdb are cleared upon service restart, we’re simply ensuring that this will be cleaned up eventually.
  • 4-6: Check if dbo.Employees exists. If so, drop it.
  • 8-14: Create a table to hold our example data.
  • 16-36: Manually populate the example data. Note that we’re using a UNION ALL between the selects, so only a single INSERT occurs.

As for the query to manipulate this data, here is the actual CTE:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Use a CTE to flatten an organization structure to each department
WITH AllMyChildren (ManagerEmployeeID, EmployeeID, GroupManagerID, EmployeeName, DEPTH)
AS
(
	-- The anchor statement. This selects each department head
	-- The important part is to alias the original EmployeeID as the GroupManagerID
	SELECT ManagerEmployeeID, EmployeeID, EmployeeID [GroupManagerID], EmployeeName, 0 [DEPTH]
	FROM dbo.Employees WHERE EmployeeID in
		-- We want all employees 1 level below below a certain level, so we'll utilize a subquery to find them
		(
		-- Get EmployeeIDs that are children to their common parent
		SELECT EmployeeID FROM dbo.Employees WHERE ManagerEmployeeID = 0
		)
	UNION ALL
	-- The recursive statement which finds all employees under each department
	SELECT pc.ManagerEmployeeID, pc.EmployeeID, amc.GroupManagerID, pc.EmployeeName, DEPTH + 1
	FROM dbo.Employees pc
	INNER join AllMyChildren amc ON pc.ManagerEmployeeID = amc.EmployeeID
)
-- The CTE is primed, but we still need to execute it with the statement below
SELECT EmployeeName, GroupManagerID, DEPTH 
FROM AllMyChildren
ORDER BY GroupManagerID, DEPTH
-- Use the MAXRECURSION query hint to avoid default recursion limit of 100
OPTION (MAXRECURSION 0)

I’ve commented liberally, but I’ll go through this line-by-line to explain it in more detail.

  • 2-4: Define the CTE (AllMyChildren) and the columns that it will output (ManagerEmployeeID, EmployeeID, GroupManagerID, EmployeeName, Depth)
  • 7-8: This is the anchor statement. SELECT the fields we eventually want outputted. The key to this is that EmployeeID is being included a second time, but aliased as GroupManagerID.
  • 10-13: This is a subquery in the WHERE condition of the anchor statement. Because we want to discard some levels (only the topmost level in our example), we need to tell the anchor to fetch all children whose parent EmployeeID is 0. Adjust this sub-query to affect at what level the teams should roll up to.
  • 14-18: UNION ALL the anchor to the recursive portion of the CTE. The recursive part of the query is similar to the anchor in that it must SELECT similar columns. Notice that it again selects from dbo.Employees, but it also performs an INNER JOIN against the CTE, linking the EmployeeID and ManagerEmployeeID. This is what causes to recursion to occur. Additionally, we increment Depth by 1. The inclusion of the level is completely optional in the CTE, but may be helpful for reporting.
  • 19: Close the CTE block. At this point, it is defined, but we have yet to execute it.
  • 21-23: Perform a SELECT against the CTE to start it.
  • 25: By default, recursion in Common Table Expressions is limited to 100 iterations. If you anticipate having more than 100 records, you’ll need to specify MAXRECURSION 0. When testing, you can limit it to any number up to 32,767 to prevent wild-running queries. Here – since this is tested and working – we specify “0″ for unlimited iterations.

And this is the result of the above query:

The tabular results of the query

The tabular results of the query


Notice that the GroupManagerID is the same for each team, regardless of depth. Also, the Depth corresponds to how far down the tree each employee exist. I’ve conveniently coded the EmployeeName in dot-notation to make the comparison much simpler.

  • Share/Bookmark

1 Response to Flatten Heirarchies in SQL Server with Common Table Expressions

Avatar

xr280xr

April 12th, 2010 at 10:55 am

Excellently documented! Thanks for the example.

Comment Form

  • Yushe » Blog Archive » Conectar SQL Server con MySql: [...] Disclaimer: El árticulo arriba no es mio, es solo una traducción parcial del árticulo aqui [...]
  • Nick: Here is an extensive how to on sharing folders with Windows 7. http://www.ilertech.com/2010/09/shar [...]
  • dushyant kayarkar: I attempted to update my password on Ubuntu today and encountered a strange error. I thought I’d d [...]
  • Prakash - Savvysoft Technologies: USE master GO -- To use named parameters: Add linked server in the source (Local machine - eg: Mac [...]
  • Mike: Doh! The inprocess flag was killing me. Thanks. [...]


This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 United States.