Tuesday, 12 February 2008

When PHP string comparions go wrong

I'm a little ashamed to admit that I didn't know about this before, but I've just never come across it. When PHP compares two strings it converts them to integers if they both appear to be numbers, unless you use the === operator to compare types as well. The comparison operator docs state:

If you compare an integer with a string, the string is converted to a number. If you compare two numerical strings, they are compared as integers. These rules also apply to the switch statement.

Keeping that in mind, take a look at this code:
if ('000E00080001' == '000E0008') {
echo 'equal';
}
This statement evaluates to TRUE, but I wasn't sure why.

Even reading the comparison operator docs doesn't make it clear, but it does link off to a very important piece of information about strings. In particular, the section on string conversion to numbers states:
The string will evaluate as a float if it contains any of the characters '.', 'e', or 'E'. Otherwise, it will evaluate as an integer.

And the truth shall set you free!

The letter "e" in my two strings is considered part of the exponent of a number. In both cases, the number is "0" followed by an exponent. When PHP compares those two strings, it compares them as "zero to the power of ..." because they are both considered numbers. So in this case, it is comparing zero with zero and finding it is TRUE.

This also works with decimal points. This evaluates to TRUE as well:
if ('0000.' == '0000.0') {
echo 'equal';
}
And just to confirm what is going on, adding a character that is not considered part of a number results in the expected behavior. This evaluates to FALSE because PHP no longer considers these values numbers:
if ('000E000F0001' == '000E000F') {
echo 'equal';
}
And, if the initial portion of our number is not zero, the exponent modifies the value. We are no longer comparing "zero to the power of ..." (which is always zero) in the following example; we are comparing "one to the power of ...", so it evaluates to FASLE:
if ('001E00080001' == '001E0008') {
echo 'equal';
}
So what is the solution? You can either use strcmp() to compare your strings, or use the === operator to compare types as well. When PHP compares types, it will not try and convert strings to numbers.

Notch one up for using the === equal operator, which we use exclusively in MySource4. We even go as far as banning type-insensitive and implicit comparisons using PHP_CodeSniffer. If you want to do the same, you can get PHP_CodeSniffer from PEAR and use the included Squiz standard.

2 comments:

Luke Wright said...

A further appendix to this, which was an independent point of curiosity lately, was how octal ('0') and hexadecimal prefixes ('0x') affect string comparison of strings containing integers.

It appears the octal '0' prefix is just treated as another digit as per the PHP 'numeric strings' spec and is chopped off, and therefore the remaining digits are treated as a decimal integer.

So, ('010' == '8') returns false because PHP is trying to compare integer 10 with integer 8. (Incidentally, this means that ('010' == '10') returns true for the 'numerical strings' reason given in your original post - which may be undesirable in itself.)

In the case of hex '0x', this does see the string being treated as an integer expressed in hex. So, ('0x10' == '16') returns true.

Perhaps more reason to discourage '==' in favour of '===' (my preference out of the alternatives) when there's any chance the comparison might involve a numeric string - which could be any string data, really.

(Using interactive mode of PHP 5.2.5/Win32, for reference.)

José said...

I got burned by this today in a different way.

I was checking for an IP being within a certain IP range by representing both the IP and the IP range like strings of 0s and 1s, and then comparing the relevant most significant bits.

But strings like '110000000000000000000010' don't translate too well to ints, because PHP_INT_MAX == 2,147,483,647 < 110,000,000,000,000,000,000,010.


So both 11000000 00000000 00000010 and 11000000 00000000 10101101 (192.0.2 and 192.0.173) were overflowing to 2147483647 and being considered equal.


Here's the code, btw, in case it comes useful to someone:

// Code taken from a comment in this thread:
// http://gregsherwood.blogspot.com/2008/02/when-php-string-comparions-go-wrong.html
// Consider this released under the zlib license.

// $ip in dotted format
function ip2bin($ip){
    $octets = explode(".", $ip);
    foreach($octets as $k => $v){
        $octets[$k] = str_pad(decbin($v), 8, "0", STR_PAD_LEFT);
    }
    return implode('', $octets);
}

// $ip in dotted format
// $prefix in dotted format
// $mask_len like in the number after the slash in CIDR notation
function ip_in_range($ip, $prefix, $mask_len){
    $ip = ip2bin($ip);
    $prefix = ip2bin($prefix);
    
    $ip = substr($ip , 0, $mask_len);
    $prefix = substr($prefix, 0, $mask_len);
    
    // Watch out! Two numerical strings are converted to integers when you use ==.
    // This is trouble for long integers. Using === skips this behaviour.
    return ($ip === $prefix);
}

// $ipaddr in dotted format
function is_private_ip_addr($ipaddr){
    // return filter_var($ipaddr, FILTER_VALIDATE_IP, FILTER_FLAG_NO_RES_RANGE|FILTER_FLAG_NO_PRIV_RANGE);
    // ^-- Heh! Fails on 127.0.0.1 !

    // IPv6 not supported
    if( ip_in_range($ipaddr, "127.0.0.0", 8) ) return true; // Loopback
    if( ip_in_range($ipaddr, "10.0.0.0", 8) ) return true; // Private addresses (Class A range)
    if( ip_in_range($ipaddr, "172.16.0.0", 12) ) return true; // Private addresses (Class B range)
    if( ip_in_range($ipaddr, "192.168.0.0", 16) ) return true; // Private addresses (Class C range)
    if( ip_in_range($ipaddr, "169.254.0.0", 16) ) return true; // "This" network
    if( ip_in_range($ipaddr, "192.0.2.0", 24) ) return true; // "TEST-NET" (documentation and examples)
    if( ip_in_range($ipaddr, "224.0.0.0", 4) ) return true; // Multicast
    if( ip_in_range($ipaddr, "240.0.2.0", 4) ) return true; // Reserved for future use
    
    return false;
}