Apr 222010
 

This us more a note to self than anything else, as I trawled the internet trying to find a definitive answer to this.  Modified preorder tree traversal (MPTT) is an efficient way to store and query hierarchical data in a database.  This article on the MySQL site explains why this is the case with a neat comparison between MPTT and a more “old school” approach which relies on a parent-child recursive relationship in the database (known as the Adjacency List Model).  I wont repeat what it says but suffice to say I’ve been experimenting with MPTT for a while and like the way it does things.

I use cakephp a lot and I have recently created a hierarchy of some 120,000 items, all neatly mapped with parent-child relationships but no MPTT data.  Cake’s inbuilt mechanism for converting adjacency list model to a modified preorder tree traversal (MPTT) model hierarchy was far to slow: on my PC and with this much data it was taking about a second per row, which adds up to an awfully long time to process the whole tree!

Given the need for this type of conversion I found it surprisingly difficult to find a script that the the Agency list to MPTT conversion, so I thought I’d document what I found here, and add to it if I find any further.

So, using the CakePHP naming conventions here is a script to convert Agency List to MPTT using MySQL and PHP.  This is much faster than the native CakePHP model->recover() method (which rebuilds your hierarchy) which is important on big datasets obviously.  This script is based on the function in this excellent article on MPTT.

//Make sure it doesn't time out
        set_time_limit (0);

	$con = mysql_connect("localhost","user","password");
	if (!$con){die('Could not connect: ' . mysql_error());}
	mysql_select_db("your_db", $con);

	function rebuild_tree($parent_id, $left) {
	   // the right value of this node is the left value + 1
	   $right = $left+1;   

	   // get all children of this node
	   $result = mysql_query('SELECT id FROM categories '.
	                          'WHERE parent_id="'.$parent_id.'";');   

	   while ($row = mysql_fetch_array($result)) {
	       // recursive execution of this function for each
	       // child of this node
	       // $right is the current right value, which is
	       // incremented by the rebuild_tree function
	       $right = rebuild_tree($row['id'], $right);
	   }   

	   // we've got the left value, and now that we've processed
	   // the children of this node we also know the right value
	   mysql_query('UPDATE categories SET lft='.$left.', rght='.
	                $right.' WHERE id="'.$parent_id.'";');   

	   // return the right value of this node + 1
	   return $right+1;
	}   

rebuild_tree('1',1);

mysql_close($con);

Using pure traditional SQL you can use the following script apparently, but I didn’t have time to make it work in MySQL and the above processed things quickly enough. The source of the script below is here http://data.bangtech.com/sql/nested_set_treeview.htm

-- Tree holds the adjacency model
CREATE TABLE Tree
(emp CHAR(10) NOT NULL,
 boss CHAR(10));

INSERT INTO Tree
SELECT emp, boss FROM Personnel;

-- Stack starts empty, will holds the nested set model
CREATE TABLE Stack
(stack_top INTEGER NOT NULL,
 emp CHAR(10) NOT NULL,
 lft INTEGER,
 rgt INTEGER);

BEGIN ATOMIC
DECLARE counter INTEGER;
DECLARE max_counter INTEGER;
DECLARE current_top INTEGER;

SET counter = 2;
SET max_counter = 2 * (SELECT COUNT(*) FROM Tree);
SET current_top = 1;

INSERT INTO Stack
SELECT 1, emp, 1, NULL
 FROM Tree
 WHERE boss IS NULL;

DELETE FROM Tree
 WHERE boss IS NULL;

WHILE counter <= (max_counter - 2)
LOOP IF EXISTS (SELECT *
 FROM Stack AS S1, Tree AS T1
 WHERE S1.emp = T1.boss
 AND S1.stack_top = current_top)
 THEN
 BEGIN -- push when top has subordinates, set lft value
 INSERT INTO Stack
 SELECT (current_top + 1), MIN(T1.emp), counter, NULL
 FROM Stack AS S1, Tree AS T1
 WHERE S1.emp = T1.boss
 AND S1.stack_top = current_top;

 DELETE FROM Tree
 WHERE emp = (SELECT emp
 FROM Stack
 WHERE stack_top = current_top + 1);

 SET counter = counter + 1;
 SET current_top = current_top + 1;
 END
 ELSE
 BEGIN -- pop the stack and set rgt value
 UPDATE Stack
 SET rgt = counter,
 stack_top = -stack_top -- pops the stack
 WHERE stack_top = current_top
 SET counter = counter + 1;
 SET current_top = current_top - 1;
 END IF;
 END LOOP;
END;