Spectacular IT Failures: Hope Is Not a Downtime Solution

For nontechnical people in room, indices are like a table of contents, and if your table of contents is broken, you have to read the whole book every time you want to look for something. When your database indices are not working correctly, it definitely slows things down.

James Whitfill., MD
Chief Transformation Officer, HonorHealth
July 30, 2020

James Whitfill, MD, is chief transformation officer at HonorHealth, Scottsdale, Ariz and past-chair of the Society of Imaging Informatics in Medicine Board of Directors. The former CIO of a large, private radiology practice in the Southwest confessed that his least proud moment in IT yielded a treasure trove of lessons during a unique and valuable 2019 RSNA session, “Radiology Informatics Mistakes and War Stories from the Front Lines,” moderated by Peter Sachs, MD, a diagnostic and interventional radiologist at the University of Colorado Hospital, Denver, CO.

As an imaging infromaticist working in private practice, Whitfill confessed to experiencing envy when large, luminary organizations discuss their IT infrastructure at the RSNA. “When you are in private practice, you often have very few resources to do things,” he said.  “In private practice, if you need technology, your capital budget is the wallet of the physicians who run the practice, and when you make a call for more capital, you know somebody is taking a pay cut. It gives you a very different perspective on things.”

Whitfill then proceeded to share one of the worst days of his professional life. “I am not proud of having gotten into this spot, but it taught me a lot of lessons,” he said.

PACS was a relatively new concept in 2005, and the group’s community hospital had recently installed a system, so the practice decided to do so as well. “The amount of money it took just to get the system in, upgrade the network, and put a server room in was inordinate,” he shared. The practice had 10 different imaging center and was reading about 200,000 exams per year, but after budgeting for the initial investment, the money for a backup or downtime solution was not there.

The Odds of PACS Going Down

The possibility of the PACS going down was considered and the IT team wondered if that was a real probability. “We thought, ‘Hmmm, probably not’,” said Whitfill, drawing gales of laughter from the audience.  A low-cost, nominal downtime solution was devised—open source viewer on a workstation and a DICOM router—but the solution was not tested in a production workflow.

As volumes grew, physicians began to complain about slow response speed on worklist applications, and Whitfill’s team discovered that the indices on the database were not updating correctly. “For nontechnical people in room, indices are like a table of contents, and if your table of contents is broken, you have to read the whole book every time you want to look for something. When your database indices are not working correctly, it definitely slows things down.”

Whitfill and team were considering how they might update the database when the vendor’s remote support team noticed the problem and made the decision to re-index the system between midnight and 6 AM. Based on their research of other imaging center outfits in the region, the support team decided they could do the job in six hours and look like heroes when everyone came in the next morning. “What did not make matters fun is we did not have the best communication with our vendor’s remote support team,” Whitfill observed.

When he started getting calls from angry physicians at 6 AM, saying, “Our PACS is not running,” Whitfill discovered that a database update was underway and only 25% done. This was the moment they didn’t think would come, and it happened at exactly the wrong time: Just before the weekend. “We were doing incremental backups, so to do a restore from backup, we had to go back to the previous weekend and restore each incremental backup,” he explained. That would have taken 18 hours, no better than waiting for the indices to update.

That day, the practice put the untested back-up solution into place, the DICOM router and the viewers no one had seen before.  It took about 12 hours before they could get their PACS back up and in the meantime, the team got to work together in a crisis situation. In the final analysis, they practice was able to get through just 30 % of the work that came in that day, and few physicians had to cone in over the weekend to do catch-up.

Lessons Learned

  1. 1. Implement quarterly fire drills to test your downtime: The PACS is down, now read off the backup solution.
  2. 2. Take the worst-case scenario into account when calculating economies. “Because we were trying to be economical, we could not afford more processors and we had an older CPU that was good enough to keep up with the daily work, but when you got into some of these intensive jobs around database indexing it was not able to keep up,” he said.
  3. 3. Quarterly testing of an inferior backup system is likely to inspire the owners to invest in a better backup system.
  4. 4. Have a real-time database replication solution. Imaging informaticists talk a lot about high availability solutions and whether you use virtual machines or database replications, but not all disasters come in the form of an act of God. “Today, when I go to organizations and I ask about business continuity plans, it’s not just what happens if a bomb goes off, but also think about what happens if your database gets corrupted, how do you recover from that real time?” suggests Whitfill.


Subscribe to Hub
News from around the coalition and beyond

Hub is the monthly newsletter published for the membership of Strategic Radiology practices. It includes coalition and practice news as well as news and commentary of interest to radiology professionals.

If you want to know more about Strategic Radiology, you are invited to subscribe to our monthly newsletter. Your email will not be shared.