Showing posts with label Performance. Show all posts
Showing posts with label Performance. Show all posts

Wednesday, 29 July 2009

An easy way to leak memory using DirectoryIterator

I've been trying to track down a memory leak while running the PHP_CodeSniffer unit tests after being told the PEAR-wide test suite was running out during my run. My own testing showed the unit tests started at 11MB of memory used and ballooned out to about 56MB, even with no error messages being generated.

I already do a bit of cleanup to save memory (unset()ing some member vars mostly) and I tried cleaning just about everything else out, but memory usage didn't drop. I don't have any circular references, so I had nothing obvious to try.

After a bit of playing, I found the problem; a DirectoryIterator.

My code looked a bit like this:

$di = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($dir));
foreach ($di as $file) {
$basename = basename($file, '.php');
if (substr($basename, -5) !== 'Sniff') {
continue;
}
}

The $file variable is a DirectoryIterator item. I was passing it into the basename() function and relying on the fact that PHP would cast it to a string (the file name).

I changed the code to grab the file name first and then pass it into the basename() function, like this:
$di = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($dir));
foreach ($di as $file) {
$fileName = $file->getFilename();
$basename = basename($fileName, '.php');
if (substr($basename, -5) !== 'Sniff') {
continue;
}
}

So now the tests only use 16MB of memory; a healthy saving of 40MB. There is probably a bit more that can be saved by calling this code less during testing, but this was a good (and easy) result regardless.

Thursday, 17 January 2008

MySource4 caching and keywords - I think I'm in love!

Sometimes things just fall into place.

When designing the sub-systems for MySource4, even when designing seemingly basic copies of MySource Matrix functionality, I often find myself surprised by the way we've managed to make slight design changes (often forced by the MySource4 architecture) but come out with systems that are significantly better than they are in MySource Matrix. Sometimes, design decisions made months ago suddenly start to pay off in ways I don't expect. The caching and keyword systems are a perfect example; they are a match made in heaven and I'm deeply in love with both of them!

Caching was to be a fairly simple copy of the MySource Matrix caching system. One of the strengths of MySource4 is the ability to replace core systems (like caching) with a system that is better designed for a site's specific needs. Knowing this, we started thinking about a basic caching system that Matrix users could migrate to easily, with plans to write other caching systems and allow users to pick the one they wanted. What we ended up with is something I am very proud of. Not only because of the feature set, but also because of the way it fits so nicely into my MySource4 vision. This is why we designed MySource4 like we did, and it is really starting to pay off.

The MySource4 caching system is made possible by a simple but critical change to the way content is printed. In MySource Matrix, each asset type and design area grabs attribute values from the database and prints them when required. When designing the keyword system (very early on) we made a decision to always print keywords rather than real values. So instead of a menu printing asset names and links, it would print keywords that would later be replaced by the keyword system. This provides two immediate benefits; assets and design areas take less time to print and keywords can be batch replaced in a page. Both those are performance-based, but now that decision has enabled us to make a fantastic caching system.

By utilising the MySource4 channels architecture, the caching system is able to hook into the frontend painting code and cache a copy of the page after static keywords have been replaced and before dynamic keywords have been replaced. Dynamic keywords are things like the contents of a custom form or the name of the current user, and should not be cached. The next time around, the caching system can serve up a copy of the page with just the dynamic keywords left to be replaced. Note that this is a full page cache, unlike Matrix which constructs the page each time using small cache blocks. That means that the MySource4 caching system is already faster than the MySource Matrix caching system and we haven't even got to the bonus features!

One of the biggest problems with the Matrix caching system is that we can never really know which assets are cached on which URLs. For example, we know that the name of the "Contact Us" page has changed but we have no idea which asset listings to clear the cache of or which pages have the old page name in the menu. We can't know this in Matrix because we don't have a consistent way of printing data. That is solved in MySource4 because the keyword system forces us to print keywords rather than real data. The result is that the MySource4 caching system knows exactly which URLs the name of the "Contact Us" page appears on and it can clear them for you.

That is a very powerful feature. You can now go to any asset and see exactly what parts of the asset have been cached and which URLs they are cached on. You can make an informed decision about clearing the cache; do you really want to clear 400 URLs because the name of the "Home Page" has changed?

We also have some other statistics that we are trying to decide how to use best. We know how many times a page's cache has been served and how many times it was generated (cache hits and misses). We also know how long each page generation took (on average), the static keywords used on the page and the dynamic keywords used.

We can obviously calculate how much time was saved by the caching of a page by looking at the hits and misses. That is a nice figure to have available, but it's not going to really help you in any way. So we thought about looking through the cache data and showing you how much performance improvement can be gained by removing a dynamic keyword. For example, we could calculate the time saving gained by removing the name of the current user from the design. Those sort of figures can help you modify your content and designs to improve performance.

I'd love to hear any suggestions for other ways these statistics can be used. Please leave a comment if you have some ideas.

Friday, 5 October 2007

Implicit comparisons in PHP

Tobias Schlitt recently wrote a blog entry about the speed of PHP comparisons. In it, he mentions that using the === operator to compare types as well as values is faster than using the == operator to compare values only. This is something that has been known for a while and we've made it part of our coding standard for MySource4.

Yes, the performance improvement you get is minor, but even the smallest changes add up for a site with hundreds of thousands of page views a day. Think about how many comparisons would be done during a single page load and you'll see that even small improvements like this one start to become significant.

Something else he mentions is that comparing return values of functions is faster than using an implicit comparison. So, for example, this code:

if (isset($foo) === true) { ... }
is slightly faster than this code:
if (isset($foo)) { ... }

I'm not sure that the tests performed are accurate, but we don't use implicit comparisons in MySource4 either. Not because of performance but because the code is easier to read. Again, this is part of our coding standard and is enforced by the Squiz standard in PHP_CodeSniffer.

If you would also like to incorporate these simple changes into your coding standard, PHP_CodeSniffer is a great tool to help you enforce them and show you where you need to make modifications to your code. You can either use the included Squiz standard or incorporate the specific sniff, Squiz/Sniffs/Operators/ComparisonOperatorUsageSniff, into your own custom standard.