For me, one of the best things about working in software is that ninety-nine percent of the time, nobody dies. Of course, there are critical systems out there that people rely upon for their safety, but in the world of business software, it’s rare that any glitch that occurs can cause irreparable harm to life and limb.
However, there can be times when the excrement hits the extractor so hard that revenue, resources, or reputation hang teetering in the balance whilst Engineers frantically grep logfiles, and Account Managers ping out endless “Any uPDatse client on phon?” messages on Slack. Whether a system is affected by an unforeseen logic gap, a network outage, or your company’s logo has been (unintentionally) replaced with Doge, procedures for being able to react quickly and decisively are invaluable at these times.
For many years I’ve been fascinated by aviation. But only recently, thanks to YouTube channels such as Mentour Pilot and 74 Gear, have I garnered an understanding of not only the systems involved in aviation, but also the practices and procedures that keep things running smoothly and safely. Aviation has always been at the forefront of automation, redundant systems, and a no-blame safety-driven culture, but we’re still a long way from being able to simply push the button that says “Take Off”, and then four hours later, push the button that says “Land”.
Piloting a plane still requires incredible skill, training, and experience, detailed checklists, detailed “non-normal” checklists, procedural accuracy, and discipline. When things go badly wrong with an aircraft, this can happen very quickly, and the systems to hand haven’t always given pilots as much information about the fault as you may think. This situation is changing rapidly due to innovations such as Airbus’s ECAM system, and Boeing et al’s EICAS system, but it is still imperative that air crew have the ability, tools and procedures to make critical decisions very quickly.
One of the reasons I’ve always loved working in software, and one of the reasons I’m so fascinated by aviation is the myriad acronyms and initialisms, and the models for critical decision making in aviation are no exception. There are several out there - from T-DODAR to FORDEC, but the one that I prefer is PIOSEE. The great thing about PIOSEE is that it is entirely portable to the software realm, and in fact, to pretty much any critical situation you can think of. If you have to, you can even implement PIOSEE when “flying solo”, but it works most effectively when you’ve got a team of relevant, skilled, and knowledgeable people around you.
P - PROBLEM
The first step is to understand the problem. What is actually going wrong? What behaviours are occurring? What can you see, hear, smell that isn’t right?
I - INFORMATION
What do we know? Gather as much information as you can. What are the systems, the data, your customers telling you? Don’t forget that a key nugget of information may actually be in someone else's head, and not in a logfile buried somewhere in a load-balancer.
O - OPTIONS
What are your options? Sometimes there’s only one course of action, and that’s great! Your decision at the next step has been made for you. However, more often than not, there are multiple courses of action you could take. List them out, ensuring that they are based on the information gathered at the “INFORMATION” step.
S - SELECT
Pick something! As humans, we can quickly select the most appropriate action, or the one most likely to yield the best result, simply by using our “gut”. But make sure to rely on not only your instincts and experience here, but also the information gathered earlier, before deciding which action to take.
E - EXECUTE
Do something! Take your selected option and put it into action. Move swiftly, decisively, but accurately to ensure that you’re giving the option the best chance of succeeding. Do what you have decided to do. If you can instantly see that it’s causing further problems, stop and move swiftly onto the next step, before heading back to the “PROBLEM” step.
E - EVALUATE
What happened? Inspect and measure the effect your action had. See where it made a difference, and where it didn’t. Use this new information to look for refinements, and then begin your journey at “PROBLEM” again. Starting at “P” again is important because your problem might actually be different now that you’ve tried something. Or at least your understanding of it might be very different.
Also, don’t forget that once things start bursting into life again, it doesn't necessarily mean that things are stable and secure. Once you’ve hit the end, keep evaluating until you’re certain that the issue is truly resolved.
One of my favourite things about the PIOSEE method is that it’s painfully obvious when you sit back and look at it. Surely this is how all problems should be addressed, isn’t it? Yes, yes, it is…
But it’s precisely that simplicity that makes it perfect for situations where stress is high, and time is critical. It’s iterative and evidence-based, and allows for a “fail-fast” approach, which helps to keep you moving towards a solution whilst being in control of your actions, methodical in your approach, and mindful of the consequences. Just as in aviation, things can, do, and will go wrong in software and technology. The key is to develop the ability to keep a clear head, and a methodical approach, even when everything is on fire. Implementing PIOSEE could go a long way towards achieving that next time you have a critical issue.
Good luck. We’re all counting on you.
Have a look at Mentour Pilot’s absolutely fantastic video about crisis management, critical decision making, and PIOSEE in aviation, and how you can apply it to your own situations here: https://www.youtube.com/watch?v=inWmAzZGijU