An Idiot with a Hammer. Or, How I Learned Not to Rely on Cached data

I’ve spent a good deal of time this week un-doing performance improvements because they clash with acceptable practices at work. It’s brought this saying to mind:

“To an idiot with a hammer, everything looks like a nail.”

I was an idiot with a hammer.

Over a year ago, I found this article about ASP Arraysets: Developing a VBScript Class for an Extremely Lightweight Recordset Alternative

By itself, it’s very convenient – it adds recordset like accessibility to Getrows() like performance. I started using it immediately.

After a while, I started expanding that code… if the data’s just in an array, and you can join() arrays to become strings… and you can store strings in dictionaries (using an encoded representation of the original connection & query as a key), and you can store some implementations of dictionaries in the Application object… Bingo! Arrayset Caching!

And it was good. The first run of a query via Arrayset might take 10 seconds to retrieve from the database for a particularly ugly query, and to cache it… but future uses of the same, cached query dropped to less than 3/10ths of a second. You could perform any ugly, slow query you wanted — without any thought to it’s speed so long as you accepted a slow run the first time, because every other time – on any other page, up to 10 days later, it was still fast!

Cached Arraysets became my hammer.

Pulling profit center names from a cached query was so fast I quit pulling them back in my reports – I knew I’d already have that data on the website, and the id was smaller to transfer than the name. I’d just pull the profit center id and grab the name from a cached query. I made a function for doing it that I call from all over the company website. Poof! Instant access to profit center names, addresses, phone numbers — and all you need is the ID! Mwahahaha!

I pounded many, many, many “nails” this way.

During my Thanksgiving vacation, the team back at the office wanted to update data that was tidily tucked away in that cache, and noticed the website wasn’t reflecting the new database change. My suggestion of rebooting the website (which would’ve destroyed the cache) meant interrupting live users, and was disqualified. So they decided (for very good reason), “Caching is bad. Disable it.” It wasn’t a hard code change to perform, and life seemed to be okay afterward.

But slowly, surely… performance on the website began to drag. It was only a matter of weeks until the newly unveiled, redesigned, “more efficient and faster” website became a millstone around my neck. I started to blame internal users for “over-using” it, instead of considering the problem could be my code.

A commonly run report on the site that could return thousands of records – each which displayed a profit center name, started taking over 90 seconds. It originally took 12. The disable-cache change was several weeks old, and this problem developed over time, so the cause-and-effect relationship wasn’t obvious. It seemed to track to a single, query-happy internal user. I asked them to scale back their use… but a weekend later, that request was forgotten. Something had to be done!

We’ve introduced new databases since then that consolidate far-flung data into reporting systems. The website queries that go against the far-flung systems would be faster from the new report system… so we changed it. Our queries dropped from 90 seconds to 5 seconds. Brilliant! Except… the website still took another 45 seconds to display the query results! That’s a shameful thing to try and explain to a very tech-savvy boss.

Eventually, it hit me. When I went back to my “convenient” function for grabbing a profit center name, and the cache I expected to reach no longer existed, it executed a separate query for the name. Every time. Remember: I’d quit pulling names back in my queries. I used this function for everything. And now it was the weak link in an embarrassingly slow website.

This week I’ve been re-adding profit center names back to the queries that only supplied their id’s before. I’ve quit calling that function wherever possible. Timings on the re-written report that was taking 120 seconds have dropped to 12. This kind of performance improvement was available all along, too… If I’d just not been so eager to rely on that stupid hammer.

I don’t cache data like that anymore. I know the system that enables caching is still valid, and I could still use it if needed, but I choose not to so I remember to focus on making things work as efficiently as possible using lowest-common-denominator functionality. It was a lesson I spent a lot of time learning in the first place; and to have discarded it so eagerly when I figured out arrayset caching was a bad move. I’m having to re-focus on things I should’ve done right in the first place.

Lesson learned: Shiny, cool hammers don’t make everything into a nail. Use the best tool for the job. Or you’ll look like a “tool” when you’re explaining the performance problems and their solution to your boss.