max_input_vars in PHP

(2014)

I ran into a very interesting bug about 4 months into a former job.

My team had inherited a large legacy system using PHP 4, and it was becoming increasingly clear that a small number of customer orders were failing for an unidentified reason. The accountants had discovered products in a "this should never happen" limbo status and for weeks nobody had even developed a hunch, much less been able to reproduce the ghost bug. The original developer was brought back in to spend 2 full days looking for it, then a senior developer on my team spent 2 full days looking for it.

This system was hairy!

Finally, our database consultant discovered a clue while developing visualizations for an unrelated task. There was something funny about very large orders with more than 200 items. Most orders contained only 2-3 items, but occasionally, we would receive a large order. It seemed that in orders with more than 200 items, some items would be processed incorrectly. The bug was rare because large orders were infrequent, but it was also serious, because it affected the most expensive items in those orders. The bug had a single tell-tale sign, a "-1" value in a particular database field, but that value also occurred throughout the codebase. We couldn't just grep for "-1".

Based on the DBA's discovery, we were able to estimate that the problem had originated about one month before my team was hired. Since the original developer had not used source control, we couldn't review the code changes during that month. In fact, we had zero information about which files had changed recently because the file modification dates had been overwritten during a server migration. This point would prove important later.

Further investigation showed the cutoff was never more than 200, but sometimes it was less -- either 198 or 199. This led to our first testable hypothesis. The system was chatty and made lots of GET requests. It also implemented a bizarre server-side queue, written in multi-threaded C, to prevent multiple simultaneous executions of the same PHP script. Maybe one of those GET request was being truncated? GET has a maximum length afterall.

The maximum length theory turned out to be wrong, but it led us in the right direction. After a few more dead ends, I finally narrowed it down to a screen that was using an ajax POST to a PHP script.

I watched it in Chrome Inspector:

350 items * 5 fields per item = 1650 total fields

Everything looked good. The server didn't complain or log any errors.

Then I did a var_dump on the server side.

The $_POST had only 1000 fields! WTF???

1000 / 5 = 200

200, plus or minus a few because the POST had some optional checkboxes.

After perhaps 50 developer-hours, the hardest part was done. Now we could replicate the bug.

Now we knew where the error was coming from, but we still didn't know why.

Why would PHP truncate when there are more than 1000 items in $_POST?

Some more googling led to max_input_vars, which defaults to 1000. It turns out that the server had been moved to a new machine with a different php.ini file. Nobody had documented or mentioned the need to set max_input_vars during the migration (or - who knows, maybe the bug had existed before the migration too!).

Every system has its own charms, but so far this has been among my favorite bug-hunting adventure. Between programmers, accountants, and other system users, approximately 50-100 hours went into locating and dealing with a bug that was ultimately fixed by adding a couple of zeroes to an .ini file.