We know that your EHR is mission-critical, which is why we and our service partners work hard to make sure Cerbo is running smoothly. When unavoidable performance issues or outages occur, we’ll keep this page updated in real time with status updates and resolution information.
All times below are in Pacific Time.
7:40AM Pacific - Quest's Webservices Servers are down (Update 7:52AM - Tentative Resolution)Relaying orders to/from Quest appears to be down (and issue on Quest's end) and submitting an order may lock up your browser as it attempts to communicate with Quest. If your browser is locked up after trying to send an order to Quest you may need to completely restart your browser. Please refrain from sending Quest orders for at least an hour while we wait to hear back from Quest.
UPDATE 7:52AM: We're seeing Quest's servers coming back up now, though connections are slower than usual. We don't have confirmation that everything is fixed but it seems like connections to Quest should be working again.
UPDATE 8:05 AM: Quest continues to improve operational speed and is within tolerable limits now. No official word from Quest but we're going to mark this as "resolved" for now.
March 13, 2023 7:50AM Pacific - Server outage affecting some clients - Fully resolved by 12:50PM PacificA CPU spike on one of our physical nodes caused networking services for some of our client builds to be inaccessible via the network, causing connection issues for ~8% of our user-base. As of 8am Pacific we had restored connectivity for some clients and by 9:30AM Pacific the majority of clients who had contacted us were able to resume normal operations. A smaller number of clients required additional escalation to get full networking restored, and between 10AM and 1PM Pacific all remaining clients were brought back online.
We're still performing a post-event analysis, but what we know so far is that all of the impacted customers were hosted on a specific physical server rack which experienced a sudden CPU spike which disabled some networking services. Restoring connectivity required reconnecting each client's networking services on a per-customer basis (performing a graceful reset of some services) but for reasons we are still investigating, around 20 clients required more extensive intervention.
We've also isolated the source of the original CPU spike and moved it to a dedicated environment pending further investigation.
March 6, 2023 8:41AM Pacific - Quest's network "communication failures" - ResolvedImmediate resolution. Quest's interface network appears to be having issues this AM. We've reached out to their interface team but you might experience messages reporting "communication failures" until they've resolved the issue.
Feb 2, 2023 7:40AM Pacific - Reports of Slowness - Resolved 12:34AM Pacific7:40AM Pacific we got a few reports of slowness and confirmed that there was some packet loss coming across our firewall. We spoke to our host who confirmed that there is a small-scale DDOS attack occurring and that they were working on mitigating the attack. As of 8:10AM traffic appears to be moving more normally but our host is still working on full mitigation.
UPDATE 10:05AM Pacific: Speed issues appear to be largely resolved but the new rules implemented on our firewall to mitigate the DDOS attack appear to be causing some builds to intermittently not load from specific IP addresses. It does not appear to be widespread but we've confirmed that at least a handful of clinics are periodically unable to connect from specific IP addresses. We're currently working with our host to better understand what might be causing this issue.
UPDATE 1:10PM Pacific: It seems like a small number of users are still having intermittency issues. Overall, performance seems quite good, but we're collecting the IP addresses of users who are reporting connectivity issues to see if we can establish why they're being specifically blocked. As a longer-term solution, we have a network upgrade schedule for 6PM Pacific tonight where our host to do an "interim" upgrade tonight at 9PM Eastern/6PM Pacific. This is a patch to distribute our inbound traffic over a larger number of firewalls but it should let us tone down the aggressiveness of the firewall rules to ensure that we aren't seeing disruption. Parts of our network will be moved behind a different firewall to isolate some higher-volume clients from the rest of our traffic. Some of this load will shift tonight and if everything goes well we'll shift additional load over the next few days.
UPDATE 6:55PM Pacific: Our host is currently in the process of moving parts of our network behind a mirrored firewall. We expect this work to be complete in the next hour or so. In the meantime we're still seeing some intermittency but once the switch is complete we'll be running a few more tests. If you're still having issues with intermittent unavailability please contact firstname.lastname@example.org
UPDATE 9:15PM Pacific: Traffic to our US data center is now successfully load-balancing between multiple firewalls. We will begin slowly moving more traffic onto the newly deployed firewall as we become more comfortable that it is performing as expected. Our host is reporting that there are some remaining issues at the data-center level related to the host's Edge network (outside the Cerbo private cloud) that they are currently working on, but moving forward our own firewall set-up should be more resilient. We'll continue to post updates until our host has confirmed they've resolved everything.
UPDATE 12:34AM PACIFIC: Intermittency issues are confirmed as resolved. Our host found a faulty load-balancing route and was causing some traffic to not route correctly just after their edge network. This has been repaired and all systems are behaving normally.
Feb 1, 2023 11:14AM Pacific - Reports of Slowness - Resolved 11:57M PacificWe had intermittent reports of clinics having latency issues for ~40 minutes. It didn't appear to impact all clinics, and it appears to have popped up fairly randomly (speeds would be high for the vast majority of calls but periodically a specific attempt to load a resource would take several seconds to resolve). Our host has made a firewall adjustment that appears to have stabilized the issue but we're still monitoring the issue and will be meeting with them to get an update on a scheduled replacement of our firewall infrastructure that is currently being built out.
Jan 10, 2023 11:38AM-11:48AM Pacific, 12-12:05PM Pacific - Brief outage occurred on a single server note, affecting ~4% of buildsA faulty power cable that powers one server node briefly knocked that offline. This affected ~4% of client EHRs, which were inaccessible for about 15 minutes. The power cable was replaced, and appears to be operating normally.
Jan 9, 2023 7:17AM-12:00PM Pacific - Slow connectivity reported - Resolved11:30AM Pacific: Our host was able to migrate our firewall infrastructure to a new environment that appears to be able to handle the spike in DDOS traffic dramatically better. We're seeing latency rates return to normal, and you should not notice any significant lagging. We've also had them re-enable SFTP traffic so office scanners should be working again. We're continuing to monitor and will be posted a longer debrief once we've been able to properly summarize this event.
10:30AM Pacific: We've continued to communicate with host about the ongoing performance issues. It appears that this is related to a DDOS attack that is causing our geographic IP filters to become overwhelmed, but they are still looking at a work-around. They may restart the firewall (2-4 minutes of downtime) as part of these new settings. We sincerely apologize for the ongoing performance issues. We know that it makes it difficult to work when pages are taking 20 seconds to load, and we're doing everything in our power to mitigate this issue.
9:45AM Pacific: - our host's networking team restarted the firewall (resulting in ~1 minute of downtime) but it does not appear to have make a significant impact on performance. They're continuing to investigate.
9:15AM Pacific: - Our host migrated some of our firewall infrastructure to a new node which has improved speeds, but latency is still worse than normally expected. They're continuing to investigate other complicating factors.
8:54AM Pacific: Our host believes they've identified the cause of the issue and will performing a live migration of some of our firewall infrastructure. They believe that this will resolve the issue. We should have an update in 15-20 minutes.
8:17AM Pacific: Our host has been working on mitigating an issue with our firewall load that we're seeing improvement in overall speeds, but the issue is not yet resolved. Our host should be supplying an update shortly.