max_input_vars Is Maybe The Most Annoying Bug I Have Tracked Down

WARNING: This article features ANCIENT code! I'm keeping it online because it's interesting to see what I was thinking 10+ years ago. But you DEFINITELY should not be using this code. Anything you're reading about on this page has changed significantly since this was written.

Ran into a very interesting bug recently.

In an Order Management System, it was becoming increasingly apparent that certain orders failed to be completely processed, leaving them in a "this should never happen" incomplete state. For awhile there seemed to be no rhyme or reason to it - just absolutely no pattern whatsoever. Because it was a gigantic system without much logging, this was ridiculously hard to track down.

The guy who wrote the system spent 2 days looking for it. Then a senior developer spent 2 days looking for it. Finally, a database consultant discovered the first clue while developing a visualization with Tableau for an unrelated task. All the errors where on the high-end of very large orders. For example, if an order had 50 items, everything worked great. But in the unusual case that an order had, say, 350 items, the first 200 would be perfectly fine, and the last 150 experienced the bug. It took ages to discover because large orders are very rare, and the bug's signature, a "-1" value in a particular database field, also occurred in certain legitimate circumstances.

A bit more investigation showed the cutoff wasn't precisely 200 every time - sometimes it was 198, 199 or 200. This caused us to hypothesize that somewhere the system might be passing data using an HTTP GET request, which has a maximum length. Maybe it was getting truncated somewhere. The system was convoluted, including a step where the request was being bounced through a server-side queue (coded in multi-threaded C) to prevent multiple simultaneous executions of the same PHP script.

ANYWAY.. Using the database records, we estimated that the bug has been introduced in mid-July, about a month before I was hired. Unfortunately, there was no source control at the time, and all the file-modification times had been wiped out by a server migration, so we couldn't even look for files that had been modified around mid-July.

Remember, this application is big.

After a few more dead ends, I finally narrowed it down to a screen that was using an ajax POST to a PHP script. I watched it in Chrome Inspector: 350 items * 5 fields per item makes 1650 total - everything looked good. The server didn't complain or log any errors.

Then I var_dumped the $_POST super-global on the server side:

Only 1000 fields!

WTF

And that was it! 1000 / 5 = 200, plus or minus a few because the POST had some optional checkboxes.

Okay.. so..

Why would PHP truncate when there are more than 1000 items in the $_POST?

Some more googling led to max_input_vars which can be found in the PHP documentation. The default is 1000. It turns out that the server had been moved to a new machine with a different php.ini file. Nobody had documented or mentioned the need to set max_input_vars during the migration (or - who knows, maybe the bug had existed before the migration too!).

Every system has its own charms. So far, this has been my favorite bug-hunting adventure at this job. Between programmers, accountants, and other system users, approximately 50-100 hours went into locating and dealing with a bug that was ultimately fixed by adding a couple of zeroes to an .ini file.