Spectacular IT Failures: System Upgrades Can Be Downgrades

“Those that catch, correct, and learn from failure will succeed; those that wallow in the blame game will not.”

Safwan Halabi, MD
Stanford University
May 27, 2020

Safwan Halabi, MD, a radiologist and informaticist at Stanford University, recently went through a PACS upgrade that brought the system down. Fortunately, he had not severed interfaces to the previous PACS solution and was able to revert back to the old system, limiting the length of the downtime.

Dr. Halabi was one of six physician informaticists who shared lessons learned on the crucible of experience during a valuable and unique session during the RSNA, “Radiology Informatics Mistakes and War Stories from the Front Lines,” moderated by Peter Sachs, MD, a diagnostic and interventional radiologist at the University of Colorado Hospital, Denver, CO.

However, the institution did experience an unexpected PACS downtime, interrupting radiologists’ ability to work and patient care and providing a rich learning experience for Dr Halabi, from which he identified the following points of failure that he shared with attendees.

Failure No. 1: It’s never the right time. “We put the onus of these downtimes on those times that we think will be the quietest in our health systems, but it is going to impact someone, whether it is the ER doc at 2 AM or the night radiologist who has to deal with every single downtime of speech recognition, PACS, or EHR,” he said. “On top of that, why schedule it on the day you are switching over to daylight savings time, a day when things are already complex?”

Failure No. 2: Plan for planned versus unplanned downtime. You have to be ready for when you plan a downtime and how you communicate, but you also need a plan for what you will do when it is unplanned. What do you do If the server goes down or images are not going from the modalities to the PACS. Do you read off the modality? “This happens almost on a weekly basis for us even though I am in the middle of Silicon Valley,” Dr. Halabi noted.  “We go through these same IT issues day in and day out.”

Failure No. 3: Communicate. Do you have an active way of communicating whether a PACS failure is in the  reading room or the institution? Do you reach out to the CMO, the CMIO, your physicians? “What is the chain of events as it occurs?” Halabi asked. “Sometimes the silence is deafening, and you sit there in the reading room looking at each other for such a long time. When is the next level of issue—after 5 minutes, 10 minutes, should I restart my workstation?”

Failure 4: Who is in charge, who is responsible for this? Is it the IS team, is it physician leadership, is it the reading room assistant, is it the health system? It is imperative to identify who is in charge every day of the week and every hour of the day in order to to prevent chaos. “Many of you who did an internship remember those Code Blues and it was just a big cluster,” said Halabi.  “Everybody came from the hallway, but nobody knew what to do. It is really important to say who is responsible on that day at that time.”

Failure No. 5: Shutting down. “We went through this upgrade of our PACS and it didn’t work,” noted Dr. Halabi. “ER Docs are sitting there wondering when it is going to come back up,  and it gets to the point where you have to start diverting ambulance traffic because you can’t read the CT scan that was just performed on a trauma patient or you can’t ingest or digest the CD that was just brought in,” Halabi said.”It is an opportunity to begin again and think more intelligently.”

From the Failure Literature

In addition to sharing lessons learned from a failed PACS upgrade, Dr. Halabi also imparted some of the organizational wisdom he gleaned from a review of the literature that explores how organizations learn from failure. “We are not the only industry that has failure,” Halabi noted. “People have studied it and tried to perfect it.”

One of the most important insights from the literature is that failure is not always bad. There are three types of failure—preventable (as in failure to use a checklist), complexity-related (including health care, the battlefield, and startups) where a series of small accidents can produce a negative outcome, and intelligent failure (research in which cycles of failure produce understanding).

“Learning from failure is anything but straightforward: You have to tackle it head on, and you can’t play the blame game,” Dr. Halabi said. “In health IT, there are implications for security and cybercrime—we can destroy medical records, we can destroy everything in our health system, we can lose images and comparisons—so it is really a high stakes endeavor.”

Learning from failure requires close attention to process and a willingness to go to the root cause, he said. “It requires discipline and enthusiasm to celebrate failure, and it also requires sophisticated analysis to ensure that the right lessons are learned and the right remedies are employed,” he emphasized. Interdisciplinary teams are essential to understanding health care’s complex failures that occur in multiple departments.

Lessons Learned

No. 1: Schedule fire drills. “How many of you have a fire drill, a complete shut-down, somebody runs into your WiFi network and it goes down,” he asked. What do you do? Are you reading off of modalities? Are you printing film? What do you do in those downtimes?

No. 2: Have a backup plan. “Can you revert back to where you were an hour ago?” he asked. For instance, when you install a new PACS and it doesn’t work, you are dead in the water if you eliminate all connections to your old PACS.

No. 3: Conduct post-mortems. Look at the root cause analyses and look back and say, “What happened, why did this fail?” Sit down and have an honest post-mortem discussion

“Those that catch, correct, and learn from failure will succeed; those that wallow in the blame game will not,” he cautioned.

BACK
Subscribe to Hub
News from around the coalition and beyond