Feb 2, 2023 7:40AM Pacific - Reports of Slowness - Resolved 12:34AM Pacific7:40AM Pacific we got a few reports of slowness and confirmed that there was some packet loss coming across our firewall. We spoke to our host who confirmed that there is a small-scale DDOS attack occurring and that they were working on mitigating the attack. As of 8:10AM traffic appears to be moving more normally but our host is still working on full mitigation.
UPDATE 10:05AM Pacific: Speed issues appear to be largely resolved but the new rules implemented on our firewall to mitigate the DDOS attack appear to be causing some builds to intermittently not load from specific IP addresses. It does not appear to be widespread but we've confirmed that at least a handful of clinics are periodically unable to connect from specific IP addresses. We're currently working with our host to better understand what might be causing this issue.
UPDATE 1:10PM Pacific: It seems like a small number of users are still having intermittency issues. Overall, performance seems quite good, but we're collecting the IP addresses of users who are reporting connectivity issues to see if we can establish why they're being specifically blocked. As a longer-term solution, we have a network upgrade schedule for 6PM Pacific tonight where our host to do an "interim" upgrade tonight at 9PM Eastern/6PM Pacific. This is a patch to distribute our inbound traffic over a larger number of firewalls but it should let us tone down the aggressiveness of the firewall rules to ensure that we aren't seeing disruption. Parts of our network will be moved behind a different firewall to isolate some higher-volume clients from the rest of our traffic. Some of this load will shift tonight and if everything goes well we'll shift additional load over the next few days.
UPDATE 6:55PM Pacific: Our host is currently in the process of moving parts of our network behind a mirrored firewall. We expect this work to be complete in the next hour or so. In the meantime we're still seeing some intermittency but once the switch is complete we'll be running a few more tests. If you're still having issues with intermittent unavailability please contact firstname.lastname@example.org
UPDATE 9:15PM Pacific: Traffic to our US data center is now successfully load-balancing between multiple firewalls. We will begin slowly moving more traffic onto the newly deployed firewall as we become more comfortable that it is performing as expected. Our host is reporting that there are some remaining issues at the data-center level related to the host's Edge network (outside the Cerbo private cloud) that they are currently working on, but moving forward our own firewall set-up should be more resilient. We'll continue to post updates until our host has confirmed they've resolved everything.
UPDATE 12:34AM PACIFIC: Intermittency issues are confirmed as resolved. Our host found a faulty load-balancing route and was causing some traffic to not route correctly just after their edge network. This has been repaired and all systems are behaving normally.
Feb 1, 2023 11:14AM Pacific - Reports of Slowness - Resolved 11:57M PacificWe had intermittent reports of clinics having latency issues for ~40 minutes. It didn't appear to impact all clinics, and it appears to have popped up fairly randomly (speeds would be high for the vast majority of calls but periodically a specific attempt to load a resource would take several seconds to resolve). Our host has made a firewall adjustment that appears to have stabilized the issue but we're still monitoring the issue and will be meeting with them to get an update on a scheduled replacement of our firewall infrastructure that is currently being built out.
Jan 10, 2023 11:38AM-11:48AM Pacific, 12-12:05PM Pacific - Brief outage occurred on a single server note, affecting ~4% of buildsA faulty power cable that powers one server node briefly knocked that offline. This affected ~4% of client EHRs, which were inaccessible for about 15 minutes. The power cable was replaced, and appears to be operating normally.
Jan 9, 2023 7:17AM-12:00PM Pacific - Slow connectivity reported - Resolved11:30AM Pacific: Our host was able to migrate our firewall infrastructure to a new environment that appears to be able to handle the spike in DDOS traffic dramatically better. We're seeing latency rates return to normal, and you should not notice any significant lagging. We've also had them re-enable SFTP traffic so office scanners should be working again. We're continuing to monitor and will be posted a longer debrief once we've been able to properly summarize this event.
10:30AM Pacific: We've continued to communicate with host about the ongoing performance issues. It appears that this is related to a DDOS attack that is causing our geographic IP filters to become overwhelmed, but they are still looking at a work-around. They may restart the firewall (2-4 minutes of downtime) as part of these new settings. We sincerely apologize for the ongoing performance issues. We know that it makes it difficult to work when pages are taking 20 seconds to load, and we're doing everything in our power to mitigate this issue.
9:45AM Pacific: - our host's networking team restarted the firewall (resulting in ~1 minute of downtime) but it does not appear to have make a significant impact on performance. They're continuing to investigate.
9:15AM Pacific: - Our host migrated some of our firewall infrastructure to a new node which has improved speeds, but latency is still worse than normally expected. They're continuing to investigate other complicating factors.
8:54AM Pacific: Our host believes they've identified the cause of the issue and will performing a live migration of some of our firewall infrastructure. They believe that this will resolve the issue. We should have an update in 15-20 minutes.
8:17AM Pacific: Our host has been working on mitigating an issue with our firewall load that we're seeing improvement in overall speeds, but the issue is not yet resolved. Our host should be supplying an update shortly.
Dec 21, 2022 8:20AM-12:05PM Pacific - Slow connectivity reported - ResolvedThis issue appears to have been caused by the GeoIP blocking Firewall service getting overloaded by a DDOS attack on one of our IP sub-ranges. Our host has adjusted some settings that should reduce this load but we'll be restarting the firewall at 10:15PM Pacific time tonight and are scheduled for a major firewall replacement early next month.
Oct. 18, 2022 9:03-10:38am Pacific - Slow connectivity Reported - ResolvedHigh CPU load on our primary firewall caused some builds to be intermittently slower than normal. The source of this CPU load was identified and all services appear to be running at normal speeds again
Oct. 13, 2022 9:06-9:47am Pacific - Reported latency - ResolvedHint Health experienced issues this morning that caused clinics linked to Hint to see increased latency when loading certain pages. Hint is working on this issue and will be posting updates here: https://status.hint.com/
Sept 29, 2022 Potential impact from Hurricane Ian - no incident to reportThankfully, there is no incident to report, but we continue to monitor Hurricane Ian's progress through Florida, where our primary servers are located. About a third of the area as a whole is without power, but the server data centers are powered, with backup generators on standby. Networks are up and running and data is flowing normally.
Sept 21, 2022 12:10-12:35pm Pacific - ~10% of builds reported connectivity issues - ResolvedA server node locked up at 12:10pm Pacific time, causing the builds that are hosted on that node to become inaccessible. We worked with the server company to get the builds back online. Services began coming back online at 12:30pm.
August 12, 2022 10:30am - Bluefin/Pax Chrome Issues - ResolvedBluefin had confirmed that the most recent version of Chrome is causing issues with the SaasConex system causing Pax A80 devices to routinely fail to report the result of a transaction back to Cerbo. This can result in a payment going through but not registering in Cerbo. Bluefin is working on the issue but in the meantime we should recommend that users switch to Firefox pending a confirmed resolution if they use the A80 devices for check out.
July 19, 2022 10-10:40am Pacific - Connectivity and slowness issues - ResolvedWidespread slowness issues were reported, and even instances of users being unable to log in/ access their EHR, starting around 10am Pacific time. This was caused by the ePrescribing relay server locking up, but part of the service still reporting as live. This caused a massive number of connections to be held open on out networks, putting the Firewall under heavy load and causing slowness across most of our network. Some clinics experienced lagging, while others were unable to connect at all for several minutes. We have resolved the issue, and will continue to work with our server company to better understand the issue and prevent it in the future.
June 22, 2022 ~1:14pm - Interfax outage - ResolvedInterfax is reporting an outage. We have pushed a fix for the Sent tab in the meantime.
May 2, 2022 ~1:30-2:15pm Pacific - Error Sending Emails - ResolvedSome clients are experiencing errors sending emails (which may also affect adding appts with the email notice enabled). This may have to do with a fix put in place to remedy the Chrome security certificate bug. We are actively investigating.
May 2, 2022 ~11:12am-1:25pm Pacific - Chrome Error Generating Certificate - "Your connection is not secure" - ResolvedA recent update to the Chrome browser is causing an issue with the certificate generation. The connection IS secure via HTTPS - this is just an issue with Chrome not recognizing the valid security certificate. Pending a solution, please try using a different browser (Firefox or, for Mac users, Safari).
Mar 1, 2022 ~9:40am-12:00pm - Twilio issues: Creating or joining telemedicine calls - ResolvedTwilio (the network behind Cerbo's integrated telemed functionality and appointment reminder SMS messages) is experiencing failures with creating or joining video call rooms. More information here: https://status.twilio.com/incidents/rcbm2q15rlkl. As of 12pm Pacific time, this appears to be resolved for new calls. Trying to join a call that you had previously tried to join during the outage may still not work.
Mar 30th, 7:40AM Pacific - Quest's Webservices Servers Outage (Update 7:52AM - Resolved)Relaying orders to/from Quest appears to be down (and issue on Quest's end) and submitting an order may lock up your browser as it attempts to communicate with Quest. If your browser is locked up after trying to send an order to Quest you may need to completely restart your browser. Please refrain from sending Quest orders for at least an hour while we wait to hear back from Quest.
UPDATE 7:52AM: We're seeing Quest's servers coming back up now, though connections are slower than usual. We don't have confirmation that everything is fixed but it seems like connections to Quest should be working again.
UPDATE 8:05 AM: Quest continues to improve operational speed and is within tolerable limits now. No official word from Quest but we're going to mark this as "resolved" for now.