News | June 03, 2007

Even though PACS systems are designed to run in mission-critical environments without interruption, sometimes PACS systems will still fail. When a PACS does go down, a PACS administrator has to face an immediate hurricane of angry physicians from the ED and Surgery as well as watching the radiology department grind to a stop. It’s not a fun time for anyone involved and we all try to prevent these episodes as much as possible. Drawing on the collective wisdom from veteran PACS administrators in ClubPACS, we have come up with a list of the most frequent failure modes to watch out for and prevent if at all possible.

10.Dongle keys are the Devil:

Dongle keys are hardware license keys that some vendors use to unlock or control usage of their software. These hardware keys usually take the form of a USB drive. If the dongles fail or are more likely removed, the system will lock up and refuse to service users from looking at images. Often times the keys are only good for a fixed period of time, expiring unexpectedly and requiring a new key to be sent in the mail. We feel that PACS systems have enough potential failure modes without having vendors introducing new single points of failure that can take out their system. (There are far better ways to ensure your user is abiding by their service agreement). This technique is a throwback from the 1980s and has absolutely no place on any modern IT system.

9.Maintenance crews unplugging things:

This happens more often than you might expect. Maintenance crews may come into your datacenter to clean floors or perform other maintenance and have been known to bump power switches or kick out cables. This happens most frequently when your ‘data center’ also doubles as a maintenance closet. We have learned that being obsessive compulsive about cabling is not a bad thing. Keeping your cables neatly tucked in and attached to cable arms allows you to pull out the server for maintenance and not pull out the cables. Label your cables accurately with hostname and port (including ports on your switch at the other end) so that you know where to plug things back in if they do come apart.

8. Stuffing your servers in a broom closet.

Servers need air to breath and can generate more heat than you think. Without air circulation you will be amazed and dismayed at how short the lifetime of your servers can be. Usually the first things to go in an overheated space are your hard drives. If you lose more than a couple of drives over a short period, you should look suspiciously at environmental factors inducing those failures. Data centers also provide very nice amenities like uninterruptible power, constant humidity, controlled access, and halon fire suppression. A water sprinkler can ruin your day in a hurry.

7. Not watching the shop.

Are you notified when a disk drive fails? Hard drive failures are the most common because they use moving parts. These failures should be expected and accounted for by using a RAID to prevent loss of data. RAID stands for Redundant Array of Inexpensive Disks. A RAID 5 configuration has about 8-12 drives with an extra disk acting as a parity drive, checksum operation. This means that if one of the 8-12 drives fails, no data is lost, because the data is distributed across the other drives. A common problem we see is that having a redundant configuration makes people overconfident and they think they will be protected in the event of multiple failures. Many times they neglect to notice that one of the drives has failed. If you don’t have a spare drive rebuild the RAID automatically, you are running in a very vulnerable state where the loss of any one additional drive will ensure the complete loss of the entire array. We’ve heard of situations where a drive loss in a RAID goes undetected for six months and then is followed by another failed drive some time later. Make sure you have a hot spare drive configured and you at least get an automatic system email when you do lose a drive so you can get a new one in ASAP.

6. Running out of disk space for images.

Images suck up space on your storage system better than a Hoover. Most people subscribe to the just-in-time buying strategy of buying storage each year to enjoy the dramatic reduction in price, however you should keep a watchful eye on your available capacity. Adding new or upgraded modalities (e.g. new 64-channel CT scanner) will cause your old burn rate to become inaccurate. You should be ever-vigilant about what disk space you have left so that you don’t underestimate it and run out of gas.

5. System is too complex.

If your PACS system uses multiple databases, like Oracle and SQL Server, and/or multiple types of operating systems, like Windows and Linux/Unix, you are asking for sleepless nights. These complex architectures are precarious and often have many single points of failure. These patchwork systems are often a result of a haphazard technology strategy that plugged various components from acquired companies together. It requires too much expertise in all the operating systems and databases to keep the system running smoothly. Keep it simple -- complexity is the enemy of reliability.

4. Upgrades can turn off your lights.

Did you ever notice just how often those scheduled downtimes boil over into unscheduled downtimes? This failure mode has several reasons:
- An unforeseen complication with software or hardware during the upgrade causes the software to crash. This might come from DLL versioning conflicts or device drivers.
- Often the new software requires more hardware resources, the technical word is bloat, and the computers you had can’t handle the load.
- In the vendor’s exuberance to upgrade you, they blew away your configuration data. Laugh now, but it happens.

Word to the wise: Always ask for a back-out plan -- or Plan B -- during an upgrade, of what to do when things go south and for the last possible abort time, also known as the Point of No Return. Those plans might need to be implemented to make sure you don’t disrupt operations at peak times like Monday morning. If you are really good you would have a test system to test out what will happen in your production environment and always remember a full backup BEFORE you upgrade.

3. Network outages.

Network outages are frequently out of our control but still can cause us a great deal of grief. CIOs have woken up quickly to the fact that their networks are no longer just for billing and now are part of the delivery of care. A faulty network is the probably the most frequent cause of a CIO’s early dismissal. There are things you can do that can add network redundancy of your system at very little cost. Most data centers will have two separate network trunks available. For an extra $100 or so, make sure to have either two Network cards (NIC) or a dual NIC on your servers so you can have a connection to both network trunks at the same time. If one of the trunks becomes unavailable, network traffic will automatically route through the other trunk and your PACS users will never know about the failure. The side benefit is you can get twice the network performance out of your servers, which might make things faster.

2. Lack of database space.

The database is the master central controller of your workflow and operations. Whenever you ask for a worklist of cases that haven’t been read or are searching for a case, the database is the one answering your questions. Databases store lots of data (although not the actual images) and need space, too. If a database system runs out of space, your system will stop dead in its tracks. This is readily solved with automated active monitoring but it still seems be a frequent failure that bites the unwary. Databases need space, too.

1. Hardware failures.

In a recent survey of database professionals regarding failures in the server room, hardware failures led the causes with 49% of the reason for unplanned outages over the past year. (Seehttp://www.dmreview.com/editorial/newsletter_article.cfm?articleId=1061….) The cost of redundant servers has become very inexpensive, especially compared to the damage done to your department’s credibility during a downtime. A loaded enterprise server with dual power supply, dual core processor server costs only $3,000 - $5,000. If you look at the cost of the system, the cost for system redundancy is very minimal. Vendors still don’t take full advantage of modern IT principles in fault tolerance and could benefit from selling that as part of their standard configuration. The simple fact is the vast majority of failures are likely attributed to poor systems management that is avoidable and doesn’t have to cost much to implement, sometimes nothing at all. The best thing you can do to prevent a failure is to keep a watchful eye on your system and keep open lines of communication with both your users and your IT department. Hardware failures are almost always preceded by many error messages on the system. The question is, are you listening? It’s far easier to recover from the warning tremors of a failure than a complete failure.


Related Content

News | Enterprise Imaging

May 4, 2023 — Fujifilm Healthcare Americas Corporation, a leading provider of enterprise imaging and informatics ...

Time May 04, 2023
arrow
News | Enterprise Imaging

March 22, 2023 — International medical imaging IT and cybersecurity company Sectra has signed a ten-year renewal ...

Time March 22, 2023
arrow
News | Digital Pathology

March 21, 2023 — Pramana, Inc., an AI-enabled health tech company modernizing the pathology sector, and PathPresenter ...

Time March 21, 2023
arrow
Feature | PACS | By Michael Valante

It is widely appreciated that picture archiving and communication systems (PACS), vendor neutral archives (VNA) and ...

Time March 09, 2023
arrow
News | Enterprise Imaging

March 8, 2023 — Insignia Medical Systems, a leading UK-based enterprise imaging provider, is enjoying continued success ...

Time March 08, 2023
arrow
Webinar | Information Technology

Postpandemic staffing shortages and increased volumes require radiologists to do more with less, exacerbating burnout ...

Time January 30, 2023
arrow
Sponsored Content | Videos | PACS

Konica Minolta Healthcare recently announced it is working with Amazon Web Services to offer its cloud-based Exa ...

Time January 27, 2023
arrow
News | Artificial Intelligence

December 9, 2022 — Viz.ai, a leader in AI-powered disease detection and intelligent care coordination, today announced ...

Time December 09, 2022
arrow
News | PACS

November 28, 2022 — At this week’s Radiological Society of North America (RSNA) annual meeting (November 27 – December 1 ...

Time November 28, 2022
arrow
News | PACS

November 21, 2022 — Konica Minolta Healthcare Americas, Inc. announced today it is working toward offering its leading ...

Time November 21, 2022
arrow
Subscribe Now