PHP Sadness

Empty T_ENCAPSED_AND_WHITESPACE tokens

Which of these lines contains a syntax error?

print "$array[$a]";     # 1: variable index
print "$array['a']";    # 2: string index
print "$array[a]";      # 3: undefined constant index
print "array['a']";     # 4: embedded single quotes, no variables
print "{$array['a']}";  # 5: index with "complex curly" syntax

I'll give you a hint; here's the syntax error:

Parse error: syntax error, unexpected '' (T_ENCAPSED_AND_WHITESPACE), expecting ...

The syntax error is in line #2, and it seems to be related to how PHP tokenizes that kind of expresssion. Here's a program that will tell us how PHP tokenizes that line:

<?php
$str = '<?php print "$array[\'a\']";';

foreach (token_get_all($str) as $t) {
  if (is_array($t)) {
    $n = token_name($t[0]);
    $v = $t[1];
  } else {
    $n = "(literal)";
    $v = $t;
  }
  printf("%26s: %s\n", $n, var_export($v,TRUE));
}

It produces this:

                T_OPEN_TAG: '<?php '
                   T_PRINT: 'print'
              T_WHITESPACE: ' '
                 (literal): '"'
                T_VARIABLE: '$array'
                 (literal): '['
 T_ENCAPSED_AND_WHITESPACE: ''
 T_ENCAPSED_AND_WHITESPACE: '\'a\']'
                 (literal): '"'
                 (literal): ';'

The token the syntax error is stuck on, a seemingly empty ('') T_ENCAPSED_AND_WHITESPACE, can be seen there. I'm not sure why the tokenizer generates it, especially because none of the other example lines from above generate anything like it. It also mysteriously thinks that the closing square bracket should be part of that token.

Conversely, here's line #5 ("complex curly" syntax) with some annotations:

                T_OPEN_TAG: '<?php '
                   T_PRINT: 'print'
              T_WHITESPACE: ' '
                 (literal): '"'
              T_CURLY_OPEN: '{'
                T_VARIABLE: '$array'
                 (literal): '['        # no empty token here
T_CONSTANT_ENCAPSED_STRING: '\'a\''
                 (literal): ']'        # correct discovery of closing square bracket
                 (literal): '}'
                 (literal): '"'
                 (literal): ';'

And, for comparison, here's what the tokenizer does when it doesn't think it's parsing a variable in a string (line #4):

                T_OPEN_TAG: '<?php '
                   T_PRINT: 'print'
              T_WHITESPACE: ' '
T_CONSTANT_ENCAPSED_STRING: '"array[\'a\']"'
                 (literal): ';'

Significance: Fast Debugging

It is very important to be able to quickly debug issues in your application. When every second of downtime costs your company money, bad error messages can mean thousands of dollars in unnecessary losses and hours of wasted developer time. Languages posing to be used in large applications need to ensure that developers can quickly discern the cause of an issue.