Blog: November 13, 2014
Evolution of our bug triage process
We recently had a meeting about our bug triage process and were trying to iron out the kinks. Before we met, I thought about our old “process” (or lack thereof) here, as well as how bugs were tackled at other places I’ve worked.
Problems that required change
As much as we had improved our process for grooming a backlog and managing priority for user stories, we were completely void of a process for doing the same with bugs. Our QA team would log a bug, it would go into a big bucket, a dev would pick something out of the bucket to work on, and we’d keep doing that for as long as time allowed. This obviously was a flawed approach. We often ran into one or more of the following problems:
- Something was identified as a bug when it was working as expected. Therefore, “fixing” the bug was actually just making a new bug.
- Sometimes, a “bug” would really be something I’d consider as new scope, and should probably have been a user story that went through the normal backlog prioritization.
- Developers had no guidance in terms of which bugs to pick up. Often, lower priority ones were fixed while more important ones sat in the queue.
For some of the smaller projects, we didn’t fully appreciate the pain this caused, especially on the last point, because all of the bugs were able to be addressed in the time we had. It wasn’t until we hit one of our large projects, for which a number of bugs were being logged, that we realized we needed a change.
Enter the triage
At this point, we implemented a triage process on this specific project to help alleviate the pain caused by some of the above issues. Here’s how it worked:
- All bugs start out in a state of Triage.
- Someone (typically the BA) periodically works through the triage list and determines the following things:
- As it is written, is this really a bug? Does it need to be closed because it’s working as expected? Or, does it need to be converted to a user story?
- If it is a bug, what is the priority and severity of the bug?
Who should own triage?
We’ve found that the BA is typically the best choice for doing triage. The BA should have the best understanding of the product’s requirements, and will often be able to quickly identify whether the bug is actually a bug. Also, just like the BA digs into user stories, the BA can facilitate any necessary conversations with the product owner or development (etc.) to determine priority of the bug, and so on.
But, even though the BA is the primary person responsible for triage, the process works best when EVERYONE is empowered to speak up. As a BA, I’ve moved bugs through triage that a developer has later questioned. The developer might remember a conversation I forgot, or a user story that trumped the original one I was thinking of when reviewing the bug. Like with anything in an agile development world, communication is key.
Reproducing bugs before pushing to development
And, speaking of mistakenly pushing bugs through triage, I should note that I don’t try to reproduce every bug as I do triage. For the most part, I just assume what the QA person has logged to be accurate. Though the triage process has helped us a lot, it can be time consuming, and it would be a LOT of extra time to reproduce every bug.
Priority vs. severity
As noted above, part of triaging bugs is identifying the priority & severity for those items that we confirm to be bugs. This distinction is actually a point of a bit of contention right now, but here’s how I look at it. We rate priority on a 1-4 scale. Though this value is subjective, here’s what it typically means to me:
- Absolutely must get done for this release.
- Unless there is some extenuating circumstance, should get done for this release.
- Would be nice to be done for this release.
- Mostly inconsequential whether it gets done or not.
- Critical – Either blocking testing of a user story, or something that could result in data loss, corruption, or some other major error. Typically, these are bugs that must be worked on immediately so that testing can continue. (Note: I typically rely on a QA person to let me know if the item is a blocker that’s preventing them from completing testing.)
- High – Acceptance criteria that is not working, or the feature is otherwise not functioning as it must.
- Medium – Perhaps acceptance criteria that isn’t working or a usability issue, but it can be worked around. Also may be a new/changed requirement that makes sense to do as a bug instead of a story for some reason.
- Low – Typically something cosmetic or otherwise low-impact (like a usability issue that is a minor inconvenience instead of a significant hindrance).
So, what’s the difference? In my mind, you could have something that is a high priority, even though it is a low severity. For example, maybe there is a typo or wrong logo that won’t really impede the user, but is a big deal for the perception of the product. Or, maybe there is something that, if encountered, is of high severity, but the priority is bumped to low because the feature is infrequently used, or the reproduction steps are very unlikely to be encountered.
Like I said, this is a point of some disagreement within our ranks, so our approach may change.
New or changed requirements as bugs
As another point of contention, some would argue that new or changed scope should NEVER be a bug. In a perfect world, sure. However, there are times when the issue is a quick fix OR you are in your last sprint and the functionality change is critical. I would hope these are typically the exception rather than the rule, but they do come up.
Also, though we want to follow process, we don’t want it to be so rigid that we’re wasting time with overhead. So, sometimes things that probably should be stories do end up as bugs.
How bugs are ranked
We use TFS to track bugs. Within TFS, we create a query that returns bugs sorted first by priority, then by severity. So, a 1-Medium would actually be ranked higher than a 2-High. And, then, developers just pull from the top of the list like they do with user stories.
Our triage process is still evolving, but it has already had a major impact in ensuring that we’re not wasting time on invalid bugs. If you’re not doing a triage process, maybe it’s time to give it a try.