Profiling: that critical 3% (Part I)
An oft-quoted bit of software-development sagacity is
“Premature optimization is the root of all evil.”
As is so often the case with quotes—especially those on the Internet—this one has a slightly different meaning in context. The snippet above invites developers to overlook the word “premature” and interpret the received wisdom as “you don’t ever need to optimize.”
Instead, Knuth’s full quote actually tells you how much of your code is likely to be affected by performance issues that matter (highlighted below).
“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.”
An Optimization Opportunity in Quino
In other articles, I’d mentioned that we’d upgraded several solutions to Quino 2 in order to test that the API was solid enough for a more general release. One of these products is both quite large and has a test suite of almost 1500 tests. The product involves a lot of data-import and manipulation and the tests include several scenarios where Quino is used very intensively to load, process and save data.
These tests used to run in a certain amount of time, but started taking about 25% longer after the upgrade to Quino 2.
Measuring Execution Speed
Before doing anything else—making educated guesses as to what the problem could be, for example—we measure. At Encodo, we use JetBrains DotTrace to collect performance profiles.
There is no hidden secret: the standard procedure is to take a measurement before and after the change and to compare them. However, so much had changed from Quino 1.13 to Quino 2—e.g. namespaces and type names had changed—that while DotTrace was able to show some matches, the comparisons were not as useful as usual.
A comparison between codebases that hadn’t changed so much is much easier, but I didn’t have that luxury.
Tracking the Problem
Even excluding the less-than-optimal comparison, it was an odd profile. Ordinarily, one or two issues stick out right away, but the slowness seemed to suffuse the entire test run. Since the direct profiling comparison was difficult, I downloaded test-speed measurements as CSV from TeamCity for the product where we noticed the issue.
How much slower, you might ask? The test that I looked at most closely took almost 4 minutes (236,187ms) in the stable version, but took 5:41 in the latest build.
This test was definitely one of the largest and longest tests, so it was particularly impacted. Most other tests that imported and manipulated data ranged anywhere from 10% to 30% slower.
When I looked for hot-spots, the profile unsurprisingly showed me that database access took up the most time. The issue was more subtle: while database-access still used the most time, it was using a smaller percentage of the total time. Hot-spot analysis wasn’t going to help this time. Sorting by absolute times and using call counts in the tracing profiles yielded better clues.
The tests were slower when saving and also when loading data. But I knew that the ORM code itself had barely changed at all. And, since the product was using Quino so heavily, the stack traces ran quite deep. After a lot of digging, I noticed that creating the
ExpressionContext to hold an object while evaluating expressions locally seemed to be taking longer than before. This was my first, real clue.
Once I was on the trail, I found that when evaluating calls (getting objects) that used local evaluation, it was also always slower.
Don’t Get Distracted
To be clear, we’re not optimizing because we feel that the framework is too slow but because we’ve determined that the framework is now slower than it used to be and we don’t know why.
Even after we’ve finished restoring the previous performance (or maybe even making it a little better), we might still be able to easily optimize further, based on other information that we gleaned during our investigation.
But we want to make sure that we don’t get distracted and start trying to FIX ALL THE THINGS instead of just focusing on one task at a time. While it’s somewhat disturbing that we seem to be created 20 million
ExpressionContext objects in a 4-minute test, that is also how we’ve always done it, and no-one has complained about the speed up until now.
Sure, if we could reduce that number to only 2 million, we might be even faster, but the point is that that we used to be faster on the exact same number of calls—so fix that first.
A Likely Culprit: Scope
I found a likely candidate in the
Scope class, which implements the
IScope interface. This type is used throughout Quino, but the two use-cases that affect performance are:
- As a base for the
ExpressionContext, which holds the named values and objects to be used when evaluating the value of an
IExpression. These expressions are used everywhere in the data driver.
- As a base for the poor-man’s IOC used in Stage 2 of application execution.
The former usage has existed unchanged for years; its implementation is unlikely to be the cause of the slowdown. The latter usage is new and I recall having made a change to the semantics of which objects are returned by the
Scope in order to make it work there as well.
How could this happen?
You may already be thinking: smooth move, moron. You changed the behavior of a class that is used everywhere for a tacked-on use case. That’s definitely a valid accusation to make.
In my defense, my instinct is to reuse code wherever possible. If I already have a class that holds a list of objects and gives me back the object that matches a requested type, then I will use that. If I discover that the object that I get back isn’t as predictable as I’d like, then I improve the predictability of the API until I’ve got what I want. If the improvement comes at no extra cost, then it’s a win-win situation. However, this time I paid for the extra functionality with degraded performance.
Where I really went wrong was that I’d made two assumptions:
- I assumed that all other usages were also interested in improved predictability.
- I assumed that all other usages were not performance-critical. When I wrote the code you’ll see below, I distinctly remember thinking: it’s not fast, but it’ll do and I’ll make it faster if it becomes a problem. Little did I know how difficult it would be to find the problem.
Preventing future slippage
Avoid changing a type shared by different systems without considering all stakeholder requirements.
I think a few words on process here are important. Can we improve the development process so that this doesn’t happen again? One obvious answer would be to avoid changing a type shared by different systems without considering all stakeholder requirements. That’s a pretty tall order, though. Including this in the process will most likely lead to less refactoring and improvement out of fear of breaking something.
We discussed above how completely reasonable assumptions and design decisions led to the performance degradation. So we can’t be sure it won’t happen again. What we would like, though, is to be notified quickly when there is performance degradation, so that it appears as a test failure.
Notify quickly when there is performance degradation
Our requirements are captured by tests. If all of the tests pass, then the requirements are satisfied. Performance is a non-functional requirement. Where we could improve Quino is to include high-level performance tests that would sound the alarm the next time something like this happens.
Enough theory: in part II, we’ll describe the problem in detail and take a crack at improving the speed. See you there.