Bitten by PHP’s Lack of Namespace (or: Runkit, How Great We Could Have Been)
The client’s install of a closed source CMS was taking over a minute to render some category archive views. Every other view was okay though, so that ruled out a general database problem.
Without having access to the source it became a lot of guessing.
I enabled MySQL’s slow log, then the general log, but nothing was taking more than 1 second to execute. I verified by running the relevant archive view queries from the log by hand - still no problem.
Next step was to start hacking away at some of the views to make sure nothing was wonky. There wasn’t. As long as it’s main() was running, it would cause the slowdown. Disabling main() loaded the page, just with no content beside the template. Okay.. that’s useless. All other hacks I wrote for this software just regex the hell out of what main() returns - they don’t actually touch it. Without access to what’s going on in main() though, I was still lost.
So after much searching, I found a de-zended version of the code online. Normally that would explain it all, but this code is a mix of HTML, PHP, and SQL, with tragic variable names, no error handling, etc. 3000 lines or so of spaghetti for main() - Fun stuff!
After making a valiant effort to clean up the de-zender’s formatting I ended up with something passable enough to look into. Turns out main() contains 30-40 function declarations inside of it, which then get put into the global namespace when main runs, and can then call the appropriate function based on the view.
I could have used Runkit to override the specific view that gives troubles, but PHP can’t handle overwriting functions. Even if it did, the function was ghetto-namespaced inside another function which would override mine, basically negating it unless I could override and rewrite the whole main() function.
Oh well!
In the end, I dug around the view’s code, and it was doing the standard:
@opendir
readdir
strstr each file name to see if a certain file exists
That works okay up to a point, but there were 11,000 files in the directory it searched, and it ran the loop up to 100 times per archive entry, up to 100 entries per page. So 110,000,000 times per page view, all because the programmers couldn’t be bothered to either regex the directory listing a whole, or do up to 100 file existance tests.
So how did it get solved? I had to remove half the files because they were old and would never get seen in an archive anyway, and blog this so in 2 years when the problem hits again I’ll remember how to solve it
0 comments
Kick things off by filling out the form below.
Leave a Comment