Thursday, 4 January 2007

PHP array of objects, passed by reference; giving me a headache!

With all the Squiz sniffs committed, I started running PHP_CodeSniffer over itself to check for any coding standard errors, and to give it a test run on some real code after such a large amount of dev. I wasn't expecting any problems; all the unit tests were passing and I hadn't done any reworking of the core.

What I got was an exception thrown almost immediately, and on a file I'd checked plenty of times previously with older versions of PHP_CodeSniffer. Before I explain the problem, let me explain how PHP_CodeSniffer creates objects.

The main PHP_CodeSniffer object is created, goes through a coding standard directory (eg. PEAR) and creates an object for each sniff it finds in there. It then places that sniff object into an array, indexed by the type of tokens that the sniff wants to listen for. What you end up with is basically a big array of objects, stored in a private member variable called $_listeners.

When PHP_CodeSniffer starts processing a file, it creates a PHP_CodeSniffer_File object and passes it the $_listeners array. The PHP_CodeSniffer_File object then stores that array in its own private member var (also called $_listeners) and uses those listeners to process tokens as they are found. Some sniffs actually ask the PHP_CodeSniffer_File object to modify the $_listeners array for the current file.

When PHP_CodeSniffer has finished processing a file, it destroys the PHP_CodeSniffer_File object and creates a new instance of the class for the next file in the processing queue. It passes another copy of the $_listeners array to the new object so it can process its file.

And therein lies the problem; the passing of the $_listeners array from PHP_CodeSniffer to the PHP_CodeSniffer_File object.

Through debugging, it was pretty easy to work out that the problem was with object references. The $_listeners array passed to the PHP_CodeSniffer_File object should have been the same each time, but an MD5 of the serialized array showed that the array was different each time. There was a really simple reason for this; arrays containing objects are always passed by reference in PHP5.

Easy fix; clone the array and pass in the clone each time. Unfortunately, you can't clone an array of objects in PHP. So what's the solution? Iterate over the array and clone each object as they are found.

Forget that! If I'm going to be creating new objects through cloning each time, I may as well just recreate the array each time. It ends up being less work in the long run due to the way the array is structured and the fact that the same object can appear in the array more than once.

It's probably worth noting at this point that the array actually contains pointers to the objects and not the objects themselves, which is why the memory requirements of this array are not out of control and is also why PHP can't clone the array; it cant clone the reference to the object, only the object itself.

So I rewrote the way the $_listeners array gets to the PHP_CodeSniffer_File object, and after a fair bit of tweaking the unit tests were all passing again and PHP_CodeSniffer was running nicely over itself.

So what did I change? The PHP_CodeSniffer object now just recurses the coding standard directory and locates valid sniff files. It then includes the class and places the class name into a single-dimensional array, $_listeners. It pass a copy of the this array to the PHP_CodeSniffer_File object, which then loops over the array and recreates the old $_listeners array structure for itself to use. Once the file has finished processing, the object, and the $_listeners array, are destroyed.

But why were the unit tests passing? When the unit tests run, PHP_CodeSniffer only creates and adds a single sniff object to the $_listeners array, and only checks a single file with that sniff. This is to ensure that each sniff's test file is only recording problems with the particular sniff it belongs to and not generating other random errors from the rest of the standard. Because of this, the sniffs were never reused and never caused any problems.

So PHP_CodeSniffer now has a lot more object creation happening, but the speed and memory usage seems to be comparable to the old version. The changes aren't in CVS yet, but they will be once I've finished checking the rest of PHP_CodeSniffer for other obscure errors.