Saturday 26 May 2007

PHP_CodeSniffer gets better newline support

I've just finished committing some fairly large changes to PHP_CodeSniffer to fix problems people are having with newlines.

Some users have reported to me that PHP_CodeSniffer incorrectly throws errors when it finds a Windows (\r\n) newline instead of a Unix (\n) one. Pattern sniffs, that check for the correct format of control structures, throw an error for every pattern tested because they are always looking for Unix newlines. When they encounter a Windows newline, the carriage return (\r) is seen as an invalid character and an error is thrown.

The problem comes from hard-coding \n characters into PHP_CodeSniffer and its sniffs. The reason this was done in the first place is because the PEAR coding standards, and those they I use myself, require you to use Unix newlines in your files.

For PHP_CodeSniffer to become more attractive to developers on other platforms, the hard-coded Unix newlines had to be removed so the existing set of sniffs could be included into new coding standards without any problems. The job is all but done now, but it was more complex than I expected.

The easy part was determining what newline character each file used. PHP_CodeSniffer already had a class that represents each file being checked, so it was pretty easy to add a check in there to look at the first line of the file and see what newline character it was using. The end result is that PHP_CodeSniffer_File now has a new public member var called eolChar that contains either a Unix (\n) newline, a Windows (\r\n) newline or a Mac (\r) newline.

The next bit was very time consuming. I had to replace all hard-coded Unix newline comparisons with $phpcsFile->eolChar. The main problem I had was that the comment parser built into PHP_CodeSniffer uses Unix newline comparisons a fair bit, but never has access to $phpcsFile. After a lot of mucking around, it's all working.

The hardest bit was changing the AbstractPattern sniff to allow a new special pattern type, EOL, that can be used instead of hard-coding \n into a pattern. There isn't anything really complicated about the code that was added, it's just that almost every change you make to that AbstractPattern sniff seems to cause a number of unit test failures. It's a great class, but it really hangs together through some sort of black magic, and I dread changing it.

It took the best part of a day to make and test all the changes, but it was worth it. The end result is that PHP_CodeSniffer will now throw the same errors and warnings for a file no matter what newline character it is using. The only exception would be a sniff specifically designed to check for a certain newline character, which is what I will now need to add to the PEAR standard to ensure users are warned that their newline characters are incorrect.

These changes are only available in CVS at the moment, but will be released in version 0.7.0 when they've had some time to settle.