Law Thirty-Six

You gotta go with what works

Man with head in the sand

The missing step in Problem Management


In our organisation, we loosely follow the ITIL principles for IT service management. The Problem Management process is all about trying to get to the bottom of those niggles that keep happening – the ones where turning it off an on again will work, but you know it’s going to happen again tomorrow. And the next day. And the day after.

Like many of the ITIL processes, Problem Management can look a little different in reality from the way it does in the books, and so I’ve annotated the top half of the process diagram to show what really happens:

ITIL Problem Management process diagram with Prioritization step replaced with Denial

Problem Management with a dose of reality

I think it’s all too easy for us to turn the Prioritization stage into a Denial stage and to avoid getting to grips with the Problem for as long as possible. Over time, we develop finely-tuned techniques for this.

Here’s five arguments people employ for denying that a Problem exists.

It’s an isolated case

The easiest thing to do is to prioritise it away based on prevalence. If only one person is reporting it, it’s probably something specific to their PC. Issue them with a new computer or re-image their existing computer, and it will probably go away. That’s quite a drastic step, and issuing the user with a new computer can create a whole new set of problems for them. Which makes it all the more frustrating when the problem shows up on this new computer as well. Every Problem starts with a single report, and by denying problems because they’re not happening to many people (yet), what we also do is to prevent ourselves from nipping these problems in the bud.


Another avoidance approach is to blame the user: Problem Between Keyboard And Chair. They’re using it wrong – if they didn’t double-click where they should be single-clicking, the application wouldn’t crash and they wouldn’t lose all their work. By classifying the Problem as a user issue, the IT professional has at a stroke absolved themselves of all responsibility. The software/hardware is fine, it’s just that the user keeps doing unexpected things to it. Even if this really is the cause, it doesn’t negate the existence of a Problem. It may mean that the answer is not a technical fix, it’s communication or training or documentation, but it’s still a Problem that needs to be addressed.

Labeling a Problem as a user issue can be a tough sell with the end user population. Generally, it’s considered bad form to tell users that they’re stupid and that if only they were using it right, they wouldn’t be having the problem. Our company found itself in a delicate situation around this when investigating performance problems with our document management system (“DMS”) of the time. Poorly-constructed full text searches were bringing the system to a grinding halt, not just for searching but across all DMS operations. Users were searching for documents that contained words like “contract” or “letter” or “agreements”, and the architecture of the DMS was such that once you asked it to do that search, it would just stop and populate a temporary table with the million or so results that matched the user’s query, before checking them one by one to see whether the user had permission to view the document. Even if the user crashed out of their DMS client, the server would continue quite merrily until someone spotted the long-running transaction on the SQL Server and killed it.

From one perspective, the Problem was the way in which the user was querying the system – and we considered for a while naming and shaming people who made these searches. But of course the real problem was that the DMS architecture put processing search requests on the same server as other DMS activities, and with the way in which access rights were related to documents.

It’s perception

I hate performance problems. Obviously, I hate suffering from them, but I also hate it when we need to troubleshoot them. Typically, there’s no benchmark data to fall back on. Internet browsing is slow, the user says. But what constitutes fast enough? “Look,” says the user. “It’s taking too long when I click this link for the page to load.” Well, maybe…

Similarly, we had a problem when we rolled out a new standard desktop based on Windows 7. This was part of a hardware refresh as well, so users received a new PC or laptop to replace the doorstops we’d been using up until then. Around three months after the initial rollout, the complaints started coming in to say that the machines were taking  too long to start up  in the mornings. Were they? Or was it just that the honeymoon period post rollout was over. Memories of the old machines, where you could make a cup of coffee, drink it and wash the cup afterwards whilst waiting for the Ctrl + Alt + Del prompt to appear, had faded. This particular Problem became stuck in the Denial stage for several weeks before someone dug out the measurements that had been taken during the design stage, showing that start up times had indeed grown since the rollout.

Can’t reproduce

We’ve been dealing with an issue with Internet browsing performance at work at the moment that’s been stuck at the denial stage for quite a while. A call would come in from a user saying that browsing was slow, so we would swoop in to try to quantify it. We’d install HTTP Watch or Wireshark and start capturing browsing activity, do Internet speed tests from one of the benchmarking websites. And of course, these tests would not show an issue. We’d analyse the results, which would show websites loading almost instantaneously, and everyone would agree that there was no issue. Except the users. Because this was one of those most hated of Problems – the intermittent issue. Either the precise circumstances causing the problem are not understood, or they’re based on a complex coincidence of events, or for some other reason appear to be completely random. By the time we come to investigate, there’s nothing to see. Whoever is investigating types “Can’t reproduce” at the bottom of the Problem record, and closes it.

Old problem. Won’t fix

I actually read this in an internal Microsoft bug report. You could follow the history of it back through two releases of the product, with some poor Microsofty trying to get someone to take a particular Problem seriously enough that they might try to fix it. The response was that this particular feature had been broken for so long now, it wasn’t worth fixing. The other way of looking at it, from the customer’s perspective, was that it had been broken for at least two releases of the software now and it was about time Microsoft fixed it! There can be a complacency within an organisation that a particular Problem, though troublesome, is never going to get resolved. Perhaps someone looked at it once and couldn’t see a way round. But fresh eyes, different surrounding circumstances, and new technologies can sometimes mean that there’s a solution there after all, if you can be bothered to look for it.

Leave a Reply

%d bloggers like this: