We're here to help We're here to help

Service Outage Announcement - Hosted Environments Impacted [Resolved] - July 19, 2024

Created: Updated:

Dear Valued Customers,

The current global outage affecting our hosted software services has affected many organizations around the world, grounding planes and affecting major banking institutions. The issue has been identified as stemming from a defect in the Crowdstrike software, which has impacted a large number of organizations worldwide utilizing the Microsoft Operating System. 

This is not a security issue or in any way related to Keyfactor's or Crowdstrike’s security posture.

Our team is actively working through our Business Disaster Recovery (BDR) procedures to mitigate the effects of this defect and restore our services as quickly as possible. We understand the critical nature of the services we provide, and we want to assure you that we are doing everything in our power to resolve this issue promptly.

Here’s what we are doing to address the situation:

  • Collaboration with Crowdstrike: We are in direct communication with Crowdstrike and a fix was already implemented. At this point, we are working to restore the adverse effects from this defect.
  • Implementation of BDR Procedures: Our technical teams are executing our BDR plans to minimize disruption and restore full functionality as quickly as possible.
  • Continuous Monitoring and Updates: We are closely monitoring the situation and will keep you updated with any significant developments.

We apologize for any inconvenience this may cause and appreciate your patience and understanding as we work through this. Your business is important to us, and we are committed to ensuring that you are in good hands during this time.

We will providing hourly updates below that you can refer to for the current status of Keyfactor Global Services.

 

Type Date\Time Update
Initial 7/19/2024 2:53 AM EST

Initial Notification

Update 7/19/2024 10:00 AM EST
  • Multi-Region and Single-Region HA customers with CLAaaS are back online and undergoing additional verification from our operations teams.
  • Multi-Region and Single-Region HA customers with PKIaaS v2 (EJBCA backed PKI) are back online and undergoing additional verification from our operations teams.
  • Multi-Region and Single-Region HA customers with PKIaaS v1 (ADCS backed PKI) are partially back online and undergoing additional verification from our operations teams. The CA/PKI component services are still being restored but we expect these to be fully restored as well within the next 90 minutes.
  • Single-Region non-HA and Mid-Enterprise services are still being restored and more details will follow regarding updates.
Update 7/19/2024 11:00 AM EST
  • All Production Multi-Region and Single-Region HA services have been restored. Our operations teams are continuing to validate and monitor, but services should be back online at this time.
  • All Non-Production and lower assurance services and Single-Region non-HA and Mid-Enterprise services are still being restored and more details will follow regarding updates. 
Update 7/19/2024 12:00 PM EST
  • The services have been restored for the below products. Our operations teams are continuing to validate and monitor, but services should be back online at this time:
    • All Production Multi-Region HA
    • All Production Single-Region HA
  • The services for all products below are still being restored and more details will follow regarding updates:
    • Non-Production Multi-Region HA
    • Non-Production Single-Region HA
    • Production/Non-Production Single-Region non-HA
    • Production/Non-Production Mid-Enterprise
Update 7/19/2024 1:00 PM EST
  • The services have been restored for the below products. Our operations teams are continuing to validate and monitor, but services should be back online at this time:
    • All Production Multi-Region HA
    • All Production Single-Region HA
  • The services for all products below are still being restored and more details will follow regarding updates:
    • Non-Production Multi-Region HA
    • Non-Production Single-Region HA
    • Production/Non-Production Single-Region non-HA
    • Production/Non-Production Mid-Enterprise
Update 7/19/2024 2:00 PM EST
  • The services have been restored for the below products. Our operations teams are continuing to validate and monitor, but services should be back online at this time:
    • All Production Multi-Region HA
    • All Production Single-Region HA
  • The services for all products below are still being restored and are continuing to come back online. If you are in this group, your services may be partially or fully back online at this point. However, we are still working to fully restore these services:
    • Non-Production Multi-Region HA
    • Non-Production Single-Region HA
    • Production/Non-Production Single-Region non-HA
    • Production/Non-Production Mid-Enterprise
Update 7/19/2024 3:00 PM EST
  • The services have been restored for the below products. Our operations teams are continuing to validate and monitor, but services should be back online at this time:
    • All Production Multi-Region HA
    • All Production Single-Region HA
  • A large portion of the below services have been fully restored, but all products below are still actively being restored and coming back online. If you are in this group, your services may be partially or fully back online at this point. However, we are still working to fully restore these services:
    • Non-Production Multi-Region HA
    • Non-Production Single-Region HA
    • Production/Non-Production Single-Region non-HA
    • Production/Non-Production Mid-Enterprise
Update 7/19/2024 4:00 PM EST
  • We have gotten intermittent reports of HA customers experiencing issues and have traced these back to connectivity issues caused by traffic routing in our load balancers. We are resolving this issue at the moment, and our operations teams are continuing to validate and monitor, but the majority of these services should be back online at this time:
    • All HA Production Environments (Single-Region or Multi-Region)
  • A large portion of the below services have been restored, but all products below are still actively being restored and coming back online. If you are in this group, your services may be partially or fully back online at this point. However, we are still working to fully restore these services:
    • All non-HA Customers (independent of SKU or Production/Non-Production)
    • All HA non-production Environments (Single-Region or Multi-Region)
Update  7/19/2024 5:00 PM EST
  • We have gotten intermittent reports of HA customers experiencing issues and have traced these back to connectivity issues caused by traffic routing in our load balancers. We are resolving this issue at the moment, and our operations teams are continuing to validate and monitor, but the majority of these services should be back online at this time:
    • All HA Production Environments (Single-Region or Multi-Region)
  • A large portion of the below services have been restored, but all products below are still actively being restored and coming back online. If you are in this group, your services may be partially or fully back online at this point. However, we are still working to fully restore these services:
    • All non-HA Customers (independent of SKU or Production/Non-Production)
    • All HA non-production Environments (Single-Region or Multi-Region)
Update 7/19/2024 6:00 PM EST
  • The services have been restored for the below products. Our operations teams are continuing to validate and monitor, but services should be back online at this time:
    • All HA Production Environments (Single-Region or Multi-Region)
  • A large portion of the below services have been restored, but all products below are still actively being restored and coming back online. If you are in this group, your services may be partially or fully back online at this point. However, we are still working to fully restore these services:
    • All non-HA Customers (independent of SKU or Production/Non-Production)
    • All HA non-production Environments (Single-Region or Multi-Region)
Update 7/19/2024 7:00 PM EST
  • The services have been restored for the below products. Our operations teams are continuing to validate and monitor, but services should be back online at this time:
    • All HA Production Environments (Single-Region or Multi-Region)
  • A large portion of the below services have been restored, but all products below are still actively being restored and coming back online. If you are in this group, your services may be partially or fully back online at this point. However, we are still working to fully restore these services:
    • All non-HA Customers (independent of SKU or Production/Non-Production)
    • All HA non-production Environments (Single-Region or Multi-Region)
Update 7/19/2024 8:00 PM EST
  • The services have been restored for the below products. Our operations teams are continuing to validate and monitor, but services should be back online at this time:
    • All HA Production Environments (Single-Region or Multi-Region)
  • A large portion of the below services have been restored, but all products below are still actively being restored and coming back online. If you are in this group, your services may be partially or fully back online at this point. However, we are still working to fully restore these services:
    • All non-HA Customers (independent of SKU or Production/Non-Production)
    • All HA non-production Environments (Single-Region or Multi-Region)
Resolved 7/19/2024 8:00 PM EST We want to give you an update and let you know that all services have been restored to operation. We will continue to monitor the situation as well from our side. We encourage you to verify that all your services are operational as well. Should you be experiencing any further issues, please open a support ticket (for Severity 2+) or call our support hotline for Severity 1 cases.

 

Outage RCA - CrowdStrike Defective Patch

Cause of Issue:

On 19 July at approximately 01:10 EST, Windows-based Keyfactor hosted platforms began alerting the Keyfactor technical teams to a broad outage within our environments. It was determined that CrowdStrike had pushed a defective patch to their customers that affected Windows-based endpoints (https://www.crowdstrike.com/blog/falcon-update-for-windows-hosts-technical-details/). 

Due to the nature of this issue, and its broad impact across our environments, Keyfactor began remediation for affected endpoints. Various Keyfactor SaaS hosted environments were affected, including those with high availability. 

Steps Taken During Resolution: 

To address the impacted workloads, affected systems were reconstituted from backups, and brought back online. Once brought online, an extensive validation process was performed, and any issue was remediated upon discovery. 

Post-Resolution Steps: 

Keyfactor hosted environments are designed to be insulated against a myriad of scenarios such as workload failures, networking issues, or application-level difficulties. Keyfactor, like many other global organizations, had to enact broad disaster recovery procedures across the platform to address this outage. As such, Keyfactor is continuing to review our hosted platform architectures to determine how we can better insulate our hosted platforms moving forward.  

 

Keyfactor Global Support