Confusion

The Confusion of DevOps


“What got here is… failure to communicate.” –Captain of Road Prison 36, Cool Hand Luke (1967)

There is a lot of confusion around the term “DevOps” these days. Some seem to think that it stands for CI/CD pipeline tooling and Infrastructure as Code solutions. Others seem to believe that it is little more than a relabeling of “Release Engineering”, traditional “IT Ops”. or some sort of SRE function.

At its core, DevOps is not specifically about any of those things.

Delivering Outcomes More Effectively

What is central to a successful DevOps implementation is creating the conditions that ensure delivery teams can help customers reach their target outcomes. Tooling and organizational changes can be part of what helps create those conditions. However, they alone are as likely to achieve delivery effectiveness as buying and organizing a bunch of tools and lumber into neat piles is going to result in the building of the house of your dreams.

Let’s quickly run through each of the elements necessary to have a successful DevOps implementation.

#1 Know the Target Outcomes

One of the biggest problems with most Agile and DevOps implementations is that little attention is spent ensuring those doing the delivery have a good understanding of what the customer is trying to accomplish.

Rather than discussing the target outcomes and working together to discover what is currently preventing them from being achieved, managers tend to focus on the details of the solution being delivered and adherence to the processes expected to be followed to deliver them.

This, of course, misses the entire point of delivery. 

Expecting to successfully deliver what the customer wants by focusing on internal factors is as productive as driving a car while staring at your lap. Sure, you might be successful, but it is far from the most effective way to do so.

#2 Establish Shared Situational Awareness

Knowing the target outcome is only part of the battle. Just like with driving, success also depends upon the level of situational awareness you and others have of the ecosystem you are operating in.

Situational awareness is the level of contextual information you have at hand to make decisions. What is important is less the amount of information and more the relevant contextual quality that information has to guide the decisions you have to make. While too much information can create distracting noise that hides the information you need, high contextual quality directly improves your ability to make timely and accurate decisions. 

Similarly, effective situational awareness does not mean that you and your team need to know everything. Instead, you should strive to identify everything that is relevant to delivery in your ecosystem as either:

 

  • Information/knowledge you know that you know. For it to remain in this category, you need to know how and why know know that you know it, and have a means that you regularly exercise to test the accuracy and extent that it is known.
  • Information/knowledge you know that you don’t know. This includes such things as underlying details of external technologies and services you depend upon but do not have full visibility into (think cloud provided service or customer internet provider), as well as external behaviors and developments that can have a direct impact on your ability to deliver effectively. These are your ecosystem risks, and are the sorts of things that you and your team need to defend against.  Chaos Engineering approaches such as the Simian Army are designed to build in the preparedness mindset necessary to help delivery teams protect against such risks.

Anything that is relevant does not fall into one of those two categories are hazardous unknowns. They are often the very things that you think you know about but have not put in the mechanisms to prove so. The very fact that they are unknown means that they can spring up to damage decision accuracy, destabilize teams. and ultimately your ability to deliver effectively. They are bad, and need to be minimized as much as possible.

In its truest original form, DevOps has always been about how to bring together development and operations to improve the level of shared awareness throughout everyone across the service delivery lifecycle. Any tools, processes, and organizational design patterns that improve the flow of the quality contextual information, whether it is breaking down information traps and silos or by reducing the cognitive load necessary to deliver, can be a huge help.

#3 Have Effective Decision Making Positioning

Effective decisions are those that are made with sufficient situational awareness at the right time. Traditional organizations generally try to position decision making at the managerial level under the belief that they are ultimately accountable.

Unfortunately, in most delivery organizations managers often are poorly placed to have enough of the right contextual information to make an accurate and timely decision. Compounding this, most service delivery work requires a long string of accurate and timely decisions. Often a subsequent decision needs to be made based upon the findings of one or more previous ones. This makes pushing decisions up the chain too slow and error-prone to ensure success.

Instead, decision making should be positioned at the level where it can be made most effectively. More often than not, this is at the level of the people performing the work. 

The best example of this is to look at the field performance differences of the Russian and Ukrainian armies.

The Russian military pushes decision making up the command chain much the same way most organizations do. The soldiers on the ground are told what to do and when/where to do it.

The Ukrainian military, however, has been trained to follow Western military command practices since 2014. Contrary to popular belief, most modern Western militaries push decisions down to the most relevant decision making level through a mechanism called Mission Command

Mission Command is centered on the idea that the soldiers in the field are the best positioned to make decisions. It is the commander’s job to describe the overall objectives around an operation (the target outcome), outline what is and is not known about the current situation, and detail any situations or actions that must be avoided (anti-goals). This is called the Commander’s Intent, and is provided at a briefing.

The soldiers receiving the briefing then go off to build a plan of action to meet the commander’s intent, along with a list of resources they feel they will need to successfully execute it. This is given in a backbriefing, where the commander and the soldiers go back and forth to clarify the overall intent and agree to an approach forward.

Once the soldiers are on the mission, it is expected that they will adjust their plan and actions as necessary in order to meet the overall operational objectives. In some cases it may entail throwing out the bulk of the original plan. All of this is perfectly acceptable as long as anti-goals are avoided while the operational objectives are pursued.

For this reason, Ukrainian units have been able to outmaneuver and outperform their Russian counterparts in the battlefield despite their far inferior numbers and weaponry. 

#4 Minimize Delivery Friction

Delivery friction consists of anything that slows down or gets in the way of knowledge of the target outcome, situational awareness, decision making, and/or action to deliver in a timely and effective manner. 

There are any number of places where delivery friction can exist. It can be caused by variability and rework, technical debt, poorly designed processes, excessive handoffs, poor information flow, unclear outcome measures, and badly placed decision making.

The delivery friction topic is vast, and one that requires a blog series of its own.  Unfortunately most erroneously try to measure delivery friction with measures  like delivery velocity, deploy rates per unit of time, Mean Time to Respond/Recover, and others that at best only tangentially provide an indication of an organization’s ability to effectively deliver outcomes.

#5 Enable Learning

Finally, an effective DevOps implementation needs to enable learning.

Enabling learning takes far more than providing team members the ability to build and enhance their skills. It requires creating an ecosystem that is explicitly designed to make it safe to fail. 

Most organizations do themselves a disservice by celebrating success and punishing failure. To avoid damaging their standing, staff feel incentivized to hide any mistakes or failures. Such hiding not only stops important information from flowing that can result in obscuring awareness of ecosystem conditions, it also stops anyone from understanding why the mistake or failure happened so that it can be prevented from happening again in the future.

This means that by making it unsafe to fail, organizations are creating the very conditions that create more failure. More failure creates more friction, destroying the situational awareness and effective decision making necessary to successfully pursue target outcomes.