By Geoff Watts
Teams adopting Scrum not only have to deal with the normal project complexities of prioritization, estimation, and turning product backlog items into potentially deployable increments of functionality, they also often have to support a system in production or address bugs that come back to them during development. How do we track and prioritize these support activities? How do we handle emergencies? Who performs production support?
Bug of the Day
Production support can be seen as a disruption to teams that just want to get on with things but is often very dear to the heart of the system users and, therefore, the Product Owner. It can be a tricky issue to handle as it adds an extra degree of complexity to the prioritization debate. Often teams will be asked to take an approach of “production support is a necessity, so just deal with it.” After all, if the system is down, what is the point in adding more features to it? The priority must be to get it back up and running straight away.
However, this approach to production support can’t be planned—just dealing with it as it comes up can lead us away from the most appropriate decisions as we get caught up with the “bug of the day” scenario. It’s easy to lose sight of our vision and strategic plan by just dealing with whatever the latest system problem is.
Scrum asks us to prioritize our effort, sometimes making difficult decisions to create maximum business value as early as possible in order to help us achieve our strategic goals. One of those difficult decisions may be to question the relative value of fixing bugs against adding new functionality. Just like our initial instinct when faced with a user story that helps us comply with a new law might be “we have to do it, it’s the law,” there might be a scenario where, from a business perspective, the fine for not complying with the law is less than the value obtained by implementing an alternative story. This is a difficult decision, but arguably the right one for the company. Applying this theory to the area of production support, we might prefer to live with an inconvenience and have a new feature than fix that problem right away. We should at least consider the relative merits.
If we follow this route, there is a perceived risk that our bugs will never get fixed. I understand that concern, but in my experience, that risk doesn’t become a reality. There still remains the concern of how we plan for production support. I have seen teams deal with the issues of production support in different ways.
First Steps
The most common solution teams employ initially is to effectively have two backlogs—one for development features and one for production support issues. The Product Owner sets a guideline ratio for planning whereby the team will take, for example, 70 percent from the development backlog and 30 percent from the support backlog. This is arguably not much different than the team just reducing their capacity to account for support tasks and is effectively hiding the issue, shying away from the conversation around priority and increasing the risk of spending effort on sub-optimal work.
In this scenario, because production support is usually not in the easily estimable form of user stories and often crops up during the sprint, teams will have a burndown for their development stories and a burnup for their production support. The team can then review their feature/bug ratio in the daily scrum thus allowing the team to raise issues and tradeoffs with the Product Owner. There is a risk of burning through all production support before the sprint ends but the shorter the sprint, the smaller the potential impact of such a situation.
Bugs as Feature Requests
In the above scenario, teams placed production support items onto a product backlog, with an associated business value and size estimate. This begs the question, why split the product backlog into two? By combining the two backlogs, we can explicitly confront the issue of whether we are doing the “right” things, which is ultimately preferable in my opinion. While this involves more complexity from a Product Owner point of view, it allows the team to concentrate on what’s important. The opportunity is still there for the team to pick synergistic items to maximize overall value, which leads us to another interesting way of dealing with production issues.
A number of teams add existing production issues as acceptance criteria of functional user stories so that when a team opens up, or touches, that area of the system we have some pre-existing tests for when that feature is “done.” The added benefit here is that some of those bugs that might never have gotten fixed if prioritized on their own get addressed while the team is focused on the most important features. Ideally, any new problem will be defined in terms of acceptance tests that need to pass; these will then become part of the evolutionary design documentation of the system.
The key here is to try to avoid getting into an argument over whether something constitutes production support, a fault in development, or a change request. This is easier said than done but teams that can absorb this discussion rather than draw absolute lines will be the more effective and productive teams. Finding out that the acceptance criteria pass but the feature isn’t really potentially deployable should neither be an opportunity for the Product Owner to introduce scope creep nor the development team to hide behind a requirement spec (“you asked for this”) but rather an opportunity to be pragmatic and, most importantly, learn about what we did so we can improve the system and our test coverage for future stories.
This is additional valuable information. If production support issues come up, it usually means we missed something (perhaps an acceptance criterion) in our initial development. This is an opportunity for us to increase the test coverage of our system – a vital part of the team inspecting and adapting and maintaining (and increasing) the integrity of the system.
Emergencies
We still have the issue of what to do with the emergencies that crop up and aren’t part of the product backlog. The ScrumMaster or Product Owner should assess whether the issue is an actual emergency. By working in sprints we are reducing the wait time before we can plan to work on these items, often resulting in those “critical items” becoming not quite so critical.
If the issue is a true emergency, the Product Owner should have the authority to play the “emergency card,” as long as he is aware of the costs of doing so— not completing the items we planned to and, potentially, jeopardizing the sprint goal. If this happens frequently, then it might be worth considering a maintenance sprint to clear up some of the technical debt that might be causing a lot of these problems. Another option is also to shorten the sprints, thereby reducing the potential waste in these scenarios.
Who Does It?
The other issue of planning for production support is deciding who will do it. Production support issues are considered “boring,” so there is often a reluctance to sign up for them. Assigning someone (or even a pair) as the “support person” is not a popular idea and having a “support team” causes an unnecessary split along with the associated confusion. There are a number of benefits to not splitting the team, not least of which is the synergy of keeping all efforts within a sprint and, probably most importantly, the sense of responsibility the team has for a system that it not only develops but also maintains—the team is acutely aware of the impact of getting things wrong.
Some teams will rotate the support role either on a sprint-by-sprint basis or weekly basis, coupled with a rule of “if you start it, you finish it.” This can also have the secondary benefits of expanding the overall knowledge of the system for all team members and increasing cross-functionality within the self-organizing team. Doing this is difficult and will take more time if the team members are highly specialized in their particular skillsets or areas of the system but, arguably, this is a risk that could benefit from being managed proactively anyway.
Different options are available for teams to deal with the issue of production support and, although there is no “right way,” I have seen the biggest benefits accrued by teams that look at production support and feature requests with equivalence. I am very encouraged when a team is willing to take the challenge of confronting the prioritization issues of bringing support issues on a par with the development backlog items, thus sticking to one product backlog. These teams look for opportunities to improve the system as they work on it and use production support issues as already-written acceptance tests. This approach not only gives us the greater likelihood of doing the “right” things but also shows the team is up for the potentially difficult decisions during its journey towards a more agile way of working.