At the request of a reader, I’ve increased the number of old posts now included in your RSS feed. I would appreciate any feedback (good or bad) as a result of this change. I anticipate that the first time it downloads the feed, it’ll take a few extra seconds, but I’m not sure of anything else. So again, comments are appreciated.
After eight weeks of twists and turns with my publisher and with the Bowker Books in Print system, the Software Licensing Handbook, Second Edition is now available through Barnes and Nobles (and other fine booksellers). It’s a long sorid tale – not really worth telling. I’m just happy it’s available again. They’re even offering it at a discount.
ps. I’ll give a free set of the first three levels of Software Licensing Education Series videos to one person who can name the song title spoofed by the title to this post. Post your attempt in the comments. I’ll select a random name from all correct answers by next week at this time.
Two weeks ago, we started talking about service levels. Last week, we discussed how to write them and I mentioned that the best way to gain experience was to do it – repeatedly. I stand by that statement, but if you’ve never done it before or don’t have a lot of experience in writing them, then you might need some help getting started. So I’m going to provide you with some starting points for a few key service level metrics. These are the ones common for software-related contracts – so they’re not going to be universally applicable to everyone or to all situations. But they might give you a jumping off point for the creation of your own.
So, before you can measure a service level, you have to define one (or more). As I stated before, software-related services are typically measured by two major factors: Problem Response (how quickly the vendor responds to a call for help) and Problem Resolution (how quickly the vendor solves the problem). As two measures of time, they’re similar, but these are two independent measures – a vendor can do well with one and poorly with the other, for example. Additionally, embedded in both of these metrics is a key definition – the concept of Severity. So we actually have to start with the definitions and work forward.
Not all problems are created equal. Severity is the disambiguation of a particular issues’ importance. You should create at least three Severity levels, perhaps four, but never more. I like four because I think that it offers enough distinction between each Severity level without becoming so nuanced as to be irrelevant. I define Sev1 Problems as any problem resulting in a full or partial production stoppage or data inaccuracy. Sev2 Problems are a significant production inhibitor. Sev3 Problems are those where we can do our work, but only through manual intervention that requires significant production or performance inefficiency, or where reporting functions are unavailable. Finally, Sev4 Problems are any condition in/of the software other than those defined as Sev1-3, which affects the service or operation of our systems or network, but does not render such system or network unusable or inoperable.
The net result is that Sev1’s are “the sky is falling” moments; Sev2’s are “holy crap”; Sev3’s are “we’re pulling an all-nighter” and Sev4’s are “I don’t like having to do something in this really wacked-out way because the software doesn’t work to the manual’s spec”. Now, you can redefine these Severity Levels any way that you wish… but the general formula should be followed (not just because I say so… but because these are almost industry standard). As you’ll see in a moment, the distinction between each level is also important in terms of how it impacts your metrics. Additionally, the “missing” 5th severity level is one I simply don’t include anymore – but if you do so, it would be the “user interface” issue – the color palate that makes things hard to read, the minor nit that isn’t inhibiting in any way, it’s just an annoyance.
OK, so now that you have the Severity Levels defined, you can get back to the creation of metrics for Response and Resolution time. As I said before, Response Time is how quickly the vendor is going to answer a call for help. Thinking logically then, the higher the Severity Level, the more quickly the vendor should respond because the more damage delay in response would cause. My standard starts with 2 hour response time for Sev1, 4 hours for Sev2, 8 hours for Sev3 and 12 hours for Sev4. Remember, this is just response time – the time it should take the vendor to give you a PLAN for a resolution, not to actually solve the problem.
With Resolution Time, I’m measuring time, but I’m also measuring completeness, as Resolutions are dependent upon the problem being fully solved (hence the definition of the word “resolution”). For Sev1 Problems, I need immediate assistance, tempered with a little understanding of how software development works. So I ask for 100% of Problem(s) resolved in 24 hours. I follow an almost identical geometric path as the Response Times. Sev2’s should be resolved in 48 hours, Sev3’s in 72 hours and Sev4’s in 96 hours.
Seems pretty simple, actually. And, in many cases, it can be. But again, if I didn’t have a fairly thorough understanding of the software development, testing/QA and bug identification/repair process, I might be tempted to ask for unreasonable metrics, or alternatively, be willing to agree to extremely long times as well. Again, the moral of the story is to know what you want to measure and why and go from there. Next week, we’ll talk about what happens when someone blows a service level.
If you’ve taken my advice from the Software Licensing Handbook and included maintenance fee cap language that ties any increase in fees to the Consumer Price Index or x%, “whichever is less”, well, you might be in for a treat! Depending on the index you chose, and the time schedule for it (whether you chose an all-year average, or the average as of a given date for the prior twelve month period), there’s a chance that your CPI number is going to be a negative number.
Yup, that’s right, you might have a built-in maintenance fee decreasing mechanism in your contract. Now, you only have to go find it and find your CPI number. Oh, and it might also be the time to hope that you have a contract management system and that this is one of the data points you’re tracking.
“The aim of Zen practice is to discover [this] Buddha-nature within each person, through meditation and mindfulness of daily experiences. Zen practitioners believe that this provides new perspectives and insights on existence, which ultimately lead to enlightenment.” —Wikipedia
As silly as it sounds, the way to master service levels is to draft them over and over. Yeah, this is the same way to get better at anything, contracts especially. But service levels are a little special. I think it’s because they’re going the way of the Dodo. As few people ask for them, even fewer know to even think about them. It’s the same cycle that increases the quality of service levels – just in reverse. Pirsig’s book was focused on trying to define “quality” and in the end, he settled upon a mix of rationality and romanticism.
I said before that service levels have to be SMART: Specific, Measurable, Attainable, Relevant, Time-Bound. We’ll blend the rationality and romanticism as we go.
Specific – Service levels start with an understanding of the exact quantities of some metric. This could really be anything, but tempered with the next quality, you have to be able to count it. Typically, we start with things that are time-related: uptimes and downtimes, repair times and fix times. Rationality wins here almost every day (the truly romantic notion is that service levels aren’t needed at all because everything’s going to work out as planned) – these things are really easy to measure… and frankly, ease of measurement is necessary because the folks who will be monitoring the service levels aren’t really interested in tracking them. But why not be a little romantic, too? Pick something unique about the particular situation. Maybe you’re licensing software that processes transactions (so you’d count the transactions processed), or maybe you’ve hired an outsourcer to answer your support calls and service levels could be managed based on the number of successful versus unsuccessful customer service resolutions.
Measurable – This might seem obvious, but you’ve got to be able to measure what it is you’re going to base any metrics upon. Just counting isn’t necessarily enough. Rather, you might need to be able to track start/stop times/days (and then do the math to calculate the difference). If the calculation is manual, you also need people who can keep track. This, perhaps, is the most problematic part of any service level management… as the folks who want the benefits of the service level (usually managers) are not the people watching the clock or experiencing the outages first-hand (the staff). So unless the staff has some sort of reason to monitor the metric accordingly, none of this is going to matter.
Attainable – I promised you before that the Myth of the Nines would come back into your life, and here it is. The simple truth is that Five-9 availability is a pipe dream. 5.26 minutes of downtime a year. Just think about how long your average PC takes to power-cycle. Servers are typically a little longer. Even with redundant systems, backups, high-availability resources and every other techincal resource… it’s just not reasonable. Notice I didn’t say that it was impossible. It’s 100% possible. You can have 100% availability. The issue is cost. No one ever wants to PAY for that kind of availability. Not even your most demanding customers. Wanna’ test this theory? Price it out from your vendor(s) (as it’ll take more than one to keep even a single service up 24/7/365) and ask your most demanding customer if they’ll pay for the ENTIRE service themselves (since that’s the real cost to get it). Let me know if they’re willing to do it, because I have a bridge or two to sell them. Seriously, I’m not trying to be facetious. I’m a pretty demanding customer myself, but even I know and understand financial limits.
Relevant – Tied to measurable and specific is that each of your service level metrics be relevant to whatever service you’re receiving/providing. So if you’ve chosen to measure successful versus unsuccessful customer service resolutions, but it’s not tied to the behavior of the service provider, that’s not a relevant metric. The provider doesn’t have any control over what is being measured, even with perfect behavior. So where is their incentive to work towards meeting the metrics (or agreeing to them in the first place)?
Time-Bound – Service levels are limited to time. At first, this sounds quite limiting, but we’re not talking about time in terms of the length of the relationship (service levels should extend for the entire length of the relationship). Rather, the time we’re talking about here is the time frame in which each metric will be measured. So, perhaps you’re watching uptime on a daily basis… or the number of widgets produced in a week… or the number of successful service calls completed in a year… or the average length of time it takes to fix a problem of a given severity level over the span of a quarter.
OK, so now that you’ve considered all five requirements, you should have one or more appropriate service levels. If you still need some ideas, check back with me for the next installment. Meanwhile, if you have some ideas for inclusion in the next installment, send them along!
Filed under: warranty
Yet another reason for warranty language that promises proper 4-digit year date calculations.
I was reading the editorial in the current edition of Supply Chain Digest and knew I was going to respond fairly critically. But Jason Busch over at SpendMatters beat me to the punch… but it seems like he agreed with the overall tone of the article and even some of the suggestions. I significantly disagreed. What are your thoughts on this?