By Kevin Allen, Northern NSW Local Health District
Troubleshooting today’s networks can be a constant challenge, especially when you’re responsible for hunting down business-critical issues among multiple distributed locations. As the network manager for NSW (New South Wales) Local Health District, my team is responsible for managing network, server, Wi-Fi and PC support across an environment that spans 12 hospital locations and approximately 22 community health centers throughout Southeastern Australia. We also manage the point-to-point microwave links, security radio, and paging systems, as well as the IT infrastructure behind the Building Management System including 1,800 wireless access points (APs). As you can imagine, troubleshooting normal network functions in this environment is not for the faint of heart, not to mention managing all of the unique connected devices in our hospitals.
We recently had one specific troubleshooting challenge involving hospital Wi-Fi and our Duress Tag alarm devices. You might find this interesting, so I’d like to share it with you and explain how we troubleshooted and solved the problem using detailed packet analysis from Omnipeek.
The main issue was with the response times of our Real Time Location Service Duress Tags devices, which are worn by approximately 600 staff members at hospitals and community health centers in the Northern District. The tags allow staff members to discreetly raise an alarm if assistance is required, whether they are located in a hospital or in the parking lot. When a tag is activated, it silently notifies others across the network with an alert message.
All the tags are tracked by security in real time. When an alert is raised, a large red dot appears on a map with additional information about the alert. A message is then sent to all tags in close proximity, with the location of the alert displayed in text. Alert messages are also forwarded via paging and SMS services to appropriate personnel. The system also tracks the location of the duress tag, in case the staff member has moved. Since these tags are used to call for help in emergency situations, it’s vital that there is minimal latency and the alert message goes out as soon as possible after a tag is activated.
Our team began receiving a high volume of complaints from a Health Center located in Ballina (just south of Brisbane). When a staff member hits the alarm button, we expect to see a response or alert raised within 3 seconds. However, in Ballina, where they have some 150 tags, we found that the alerts for some of the tags were actually coming in three to four minutes after being activated. Not acceptable given that duress situations require a fast response time. At first, we thought we had an issue with Wi-Fi and roaming, which was impacting the response times of the duress tags. However, we could only see the symptoms of the problem and were forced to guess what was happening in the background.
After a lot of frustratingly unproductive troubleshooting, we purchased and installed Omnipeek, a network protocol analysis and troubleshooting software solution from Savvius (Savvius is now a LiveAction company). My team was immediately able to capture packets traversing the local network and Wi-Fi using Omnipeek. We could then see where packets left one AP and tried to reconnect with another one. This indicated where the access was failing across the network and allowed us to confirm that the Wi-Fi in Ballina was active, with a good signal, but that there was an issue with the actual AP connection.
Basically, the APs were saying ‘go away’ and were not letting authentication packets through. Once Omnipeek enabled us to see this, we knew exactly where to look for the root of the problem, which lay with the AP controllers. After reviewing the settings, we discovered that if a controller received more than 20 clients, the AP would reject the next client three times in order to try to push the session to another AP. It was actually a load balancing issue.
Seeing the packets involved in these situations was critical – before we had that info, all we knew is that the AP was just not letting clients connect. Without this level of visibility and analysis, we would probably still be scratching our heads. Full packet capture was a great help in identifying the cause of the problems we were experiencing, and it also highlighted a few issues with the system that have now been rectified by applying new settings. This has led to great outcomes for both our staff and patients. NSW Health has done further testing since changing all the settings and I can happily report that the tags now respond in less than three seconds.
About The Author
Kevin Allen is Network Manager at Northern NSW Local Health District.