Tom Wojcinski, partner cybersecurity and technology management at Wipfli, discusses the outage that resulted from an update to CrowdStrike software and why it surprised security professionals around the globe
By Dava Stewart
By Dava Stewart
If you were flying Delta during the week of July 18, 2024, you probably encountered some issues. The airline was one of thousands of companies affected by an outage caused by a software update by the cybersecurity company CrowdStrike.
CrowdStrike, a cybersecurity company, offers a software that’s installed on an endpoint, which can be any machine from a server to a laptop to a self-serve kiosk in an airport. Tom Wojcinski, partner, cybersecurity and technology management, Wipfli, says the software itself is called the agent. Configuration files tell the agent what to do and how to behave, and it was an update to those configuration files that caused the outage.
“You can test upgrades to the agent,” says Wojcinski. “Those do not have to be automatically deployed. But the configuration updates don’t arrive in a testable format for the customer to test before applying. The software provider is doing those configuration updates in the background.”
The update to CrowdStrike’s platform Falcon, which hooks into Microsoft Windows as a kernel process, caused an error that crashed the Windows kernel and therefore the entire OS. It didn’t affect machines running on other systems like Linux or Mac. The Falcon update wasn’t at all unusual—updates to the agent sometimes occur multiple times daily.
0
Microsoft-based systems were affected, including companies around the world in:
Rocket Software empowers enterprises worldwide to unlock core system data on legacy platforms like mainframes, driving innovation and business optimization. Rocket Data Replicate and Sync (RDRS) is a user-friendly, secure and highly scalable data integration solution that supports the broadest array of mainframe data sources. With RDRS, leverage hard-to-access mainframe data sources like IMS. Watch this video for a demonstration of how RDRS integrates and synchronizes data from an IMS database into PostGreSQL.
Companies, like Delta, with strong security programs found themselves in even worse situations. When the update crashed the Microsoft systems, the entire system shut down, greeting users to the infamous “blue screen of death.” That meant that endpoints protected by Microsoft’s BitLocker, a security feature that encrypts entire volumes, had to be manually rebooted, one at a time.
“BitLocker is a great way to protect data,” according to Wojcinski. “If a computer is stolen, or somebody steals a hard drive, they can’t get access to the data. You have to have a recovery key to decrypt the laptop or hard drive.”
In other words, someone had to visit every single one of the 8.5 million affected Microsoft devices and (if was running BitLocker) enter a unique 48-character key. “The key is automatically generated and assigned, then stored in the Activity Directory. But if you had CrowdStrike running on your Activity Directory server, it was blue screened,” explains Wojcinski. In some organizations, no one knew how to find their recovery keys.
Delta alone had 80,000 endpoints. A person had to visit each one and put in the 48-character decryption key and reboot it.
Although the CrowdStrike incident showed a vulnerability, Wojcinski points out the airline was following excellent protocol. “Kudos to Delta for having BitLocker everywhere, even on devices like kiosks designed for the public to use. They were really protecting their infrastructure,” he says.
The CrowdStrike outage may have undesirable ramifications, according to Wojcinski. Some organizations may decide to disable or discontinue their security subscriptions. Plenty of mid-sized companies outsource security to firms like CrowdStrike either because they don’t have staffing for an in-house security team, or because they want an extra layer of security.
Wojcinski says it’s plausible to imagine some “organizations might have ‘CrowdStrike sensitivity’ and decide it’s too risky and so decide to disable it and just run without it.” This is a bad decision for several reasons, but one of the most important ones is that when something like this happens, bad actors are ready to take advantage.
In a note to customers and partners, CrowdStrike founder and CEO George Kurtz wrote, “We know that adversaries and bad actors will try to exploit events like this. I encourage everyone to remain vigilant and ensure that you’re engaging with official CrowdStrike representatives.” Disabling security software only creates more openings for those bad actors to enter.
Along with keeping your security program running, Wojcinski says there’s a more basic and crucial step to securing operations, and that is knowing what your responsibilities are. “I’m a cloud evangelist,” he says. “I think most cloud services have an inherent security design, and it’s typically better than what most organizations can do on their own.”
But companies still need to have their own security, he adds. It’s important to understand that “cloud” can be Infrastructure as a Service (IaaS), Platform as a Service (PaaS) or Software as a Service (SaaS) and that each level carries different responsibilities for both the consumer and the cloud provider is the first step.
IaaS
Least responsibility
Most responsibility
PaaS
Some responsibility
Some responsibility
SaaS
Most responsibility
Least responsibility
Even at the SaaS level, where more of the security burden is on the cloud provider, customers still have responsibilities, such as making sure the right people have access and correct permissions, says Wojcinski.
Companies should also be assessing their security and running tests to make sure both their cloud and on-prem environments are actually secure and that the configuration put in place does what it’s supposed to do. By running tests companies can gain a chance to proactively fix problems before cybercriminals or hackers get in and do real damage.
One of the most remarkable things about the CrowdStrike incident is how surprising it was to security professionals. It simply wasn’t a scenario anyone had ever imagined.
Plenty of organizations have disaster recovery plans but few, if any, considered their endpoint security software causing an outage or included what might happen if they couldn’t get their BitLocker keys. “Now we know,” says Wojcinski.
Companies need to begin by understanding their own responsibilities as well as that of their cloud providers. Once that is established, they need to have a way to validate that their providers and partners are doing what they say they are doing.
Once an organization has a security plan that includes their responsibilities, they need to make sure their operations are resistant to attack, but also that applications are available to users when and where they need it. Next, they need a recovery plan in case of disaster or attack that answers the question, “How can I get back online and continue my mission?”
Tom Wojcinski is a partner in Wipfli’s cybersecurity and technology management practice. He specializes in helping organizations reduce and manage the risks to modern technology and information systems. In today’s business environment, customers, trading partners, regulators and employees expect continuously available and secure information systems. To help meet this expectation, Tom works with clients to increase the security, availability and recoverability of their information assets.
Tom leads a variety of engagements designed to help improve organizations’ cybersecurity posture, including cybersecurity risk assessment, control program development and implementation, incident response planning and simulation, vulnerability and penetration testing, security audit, control verification and managed security services. He is a frequent author and speaker on cybersecurity and IT risk management topics.