Thursday 17 January 2008

MySource4 caching and keywords - I think I'm in love!

Sometimes things just fall into place.

When designing the sub-systems for MySource4, even when designing seemingly basic copies of MySource Matrix functionality, I often find myself surprised by the way we've managed to make slight design changes (often forced by the MySource4 architecture) but come out with systems that are significantly better than they are in MySource Matrix. Sometimes, design decisions made months ago suddenly start to pay off in ways I don't expect. The caching and keyword systems are a perfect example; they are a match made in heaven and I'm deeply in love with both of them!

Caching was to be a fairly simple copy of the MySource Matrix caching system. One of the strengths of MySource4 is the ability to replace core systems (like caching) with a system that is better designed for a site's specific needs. Knowing this, we started thinking about a basic caching system that Matrix users could migrate to easily, with plans to write other caching systems and allow users to pick the one they wanted. What we ended up with is something I am very proud of. Not only because of the feature set, but also because of the way it fits so nicely into my MySource4 vision. This is why we designed MySource4 like we did, and it is really starting to pay off.

The MySource4 caching system is made possible by a simple but critical change to the way content is printed. In MySource Matrix, each asset type and design area grabs attribute values from the database and prints them when required. When designing the keyword system (very early on) we made a decision to always print keywords rather than real values. So instead of a menu printing asset names and links, it would print keywords that would later be replaced by the keyword system. This provides two immediate benefits; assets and design areas take less time to print and keywords can be batch replaced in a page. Both those are performance-based, but now that decision has enabled us to make a fantastic caching system.

By utilising the MySource4 channels architecture, the caching system is able to hook into the frontend painting code and cache a copy of the page after static keywords have been replaced and before dynamic keywords have been replaced. Dynamic keywords are things like the contents of a custom form or the name of the current user, and should not be cached. The next time around, the caching system can serve up a copy of the page with just the dynamic keywords left to be replaced. Note that this is a full page cache, unlike Matrix which constructs the page each time using small cache blocks. That means that the MySource4 caching system is already faster than the MySource Matrix caching system and we haven't even got to the bonus features!

One of the biggest problems with the Matrix caching system is that we can never really know which assets are cached on which URLs. For example, we know that the name of the "Contact Us" page has changed but we have no idea which asset listings to clear the cache of or which pages have the old page name in the menu. We can't know this in Matrix because we don't have a consistent way of printing data. That is solved in MySource4 because the keyword system forces us to print keywords rather than real data. The result is that the MySource4 caching system knows exactly which URLs the name of the "Contact Us" page appears on and it can clear them for you.

That is a very powerful feature. You can now go to any asset and see exactly what parts of the asset have been cached and which URLs they are cached on. You can make an informed decision about clearing the cache; do you really want to clear 400 URLs because the name of the "Home Page" has changed?

We also have some other statistics that we are trying to decide how to use best. We know how many times a page's cache has been served and how many times it was generated (cache hits and misses). We also know how long each page generation took (on average), the static keywords used on the page and the dynamic keywords used.

We can obviously calculate how much time was saved by the caching of a page by looking at the hits and misses. That is a nice figure to have available, but it's not going to really help you in any way. So we thought about looking through the cache data and showing you how much performance improvement can be gained by removing a dynamic keyword. For example, we could calculate the time saving gained by removing the name of the current user from the design. Those sort of figures can help you modify your content and designs to improve performance.

I'd love to hear any suggestions for other ways these statistics can be used. Please leave a comment if you have some ideas.