Shared Responsibility: Cloud Security Lessons From the CrowdStrike Outage

Tom Wojcinski, partner cybersecurity and technology management at Wipfli, discusses the outage that resulted from an update to CrowdStrike software and why it surprised security professionals around the globe

By Dava Stewart

Shared Responsibility: Cloud Security Lessons From the CrowdStrike Outage

Tom Wojcinski, partner cybersecurity and technology management at Wipfli, discusses the outage that resulted from an update to CrowdStrike software and why it surprised security professionals around the globe

By Dava Stewart

If you were flying Delta during the week of July 18, 2024, you probably encountered some issues. The airline was one of thousands of companies affected by an outage caused by a software update by the cybersecurity company CrowdStrike.

Although the media mostly covered Delta, the outage affected companies in many different sectors and of varying sizes. The CrowdStrike incident was something that not even the most seasoned security experts could have predicted, and it brings up new scenarios for people concerned about security to consider.

Software update from CrowdStrike causes systems using Microsoft Windows to crash

July 19

CrowdStrike issues apology

July 19

Microsoft issues instructions on how to remedy the problem

July 20

More than 10,000 flights were grounded globally, with many airlines and airports affected

July 19–July 25

Delta, the hardest hit airline, returns to normal operations

July 25

Delta announces plans to sue CrowdStrike

July 31

Delta confirms filing with the SEC, estimating $550 million in losses due to the outage

August 8

Breaking Down What Happened

CrowdStrike, a cybersecurity company, offers a software that’s installed on an endpoint, which can be any machine from a server to a laptop to a self-serve kiosk in an airport. Tom Wojcinski, partner, cybersecurity and technology management, Wipfli, says the software itself is called the agent. Configuration files tell the agent what to do and how to behave, and it was an update to those configuration files that caused the outage.

“You can test upgrades to the agent,” says Wojcinski. “Those do not have to be automatically deployed. But the configuration updates don’t arrive in a testable format for the customer to test before applying. The software provider is doing those configuration updates in the background.”

The update to CrowdStrike’s platform Falcon, which hooks into Microsoft Windows as a kernel process, caused an error that crashed the Windows kernel and therefore the entire OS. It didn’t affect machines running on other systems like Linux or Mac. The Falcon update wasn’t at all unusual—updates to the agent sometimes occur multiple times daily.

8.5 million

Microsoft-based systems were affected, including companies around the world in:

Transportation

Finance

Healthcare

Education

Watch the demo

Integrate IMS into the Cloud

Rocket Software empowers enterprises worldwide to unlock core system data on legacy platforms like mainframes, driving innovation and business optimization. Rocket Data Replicate and Sync (RDRS) is a user-friendly, secure and highly scalable data integration solution that supports the broadest array of mainframe data sources. With RDRS, leverage hard-to-access mainframe data sources like IMS. Watch this video for a demonstration of how RDRS integrates and synchronizes data from an IMS database into PostGreSQL.

Good Security Made It All Worse

Companies, like Delta, with strong security programs found themselves in even worse situations. When the update crashed the Microsoft systems, the entire system shut down, greeting users to the infamous “blue screen of death.” That meant that endpoints protected by Microsoft’s BitLocker, a security feature that encrypts entire volumes, had to be manually rebooted, one at a time.

“BitLocker is a great way to protect data,” according to Wojcinski. “If a computer is stolen, or somebody steals a hard drive, they can’t get access to the data. You have to have a recovery key to decrypt the laptop or hard drive.”

In other words, someone had to visit every single one of the 8.5 million affected Microsoft devices and (if was running BitLocker) enter a unique 48-character key. “The key is automatically generated and assigned, then stored in the Activity Directory. But if you had CrowdStrike running on your Activity Directory server, it was blue screened,” explains Wojcinski. In some organizations, no one knew how to find their recovery keys.

Delta alone had 80,000 endpoints. A person had to visit each one and put in the 48-character decryption key and reboot it.

Although the CrowdStrike incident showed a vulnerability, Wojcinski points out the airline was following excellent protocol. “Kudos to Delta for having BitLocker everywhere, even on devices like kiosks designed for the public to use. They were really protecting their infrastructure,” he says.

Organizations really need to understand where they are in the stack, and what type of cloud they are consuming, and they have to be educated—they have to educate themselves.

—Tom Wojcinski, partner, cybersecurity and technology management, Wipfli

Security Tips

The CrowdStrike outage may have undesirable ramifications, according to Wojcinski. Some organizations may decide to disable or discontinue their security subscriptions. Plenty of mid-sized companies outsource security to firms like CrowdStrike either because they don’t have staffing for an in-house security team, or because they want an extra layer of security.

Wojcinski says it’s plausible to imagine some “organizations might have ‘CrowdStrike sensitivity’ and decide it’s too risky and so decide to disable it and just run without it.” This is a bad decision for several reasons, but one of the most important ones is that when something like this happens, bad actors are ready to take advantage.

In a note to customers and partners, CrowdStrike founder and CEO George Kurtz wrote, “We know that adversaries and bad actors will try to exploit events like this. I encourage everyone to remain vigilant and ensure that you’re engaging with official CrowdStrike representatives.” Disabling security software only creates more openings for those bad actors to enter.

Along with keeping your security program running, Wojcinski says there’s a more basic and crucial step to securing operations, and that is knowing what your responsibilities are. “I’m a cloud evangelist,” he says. “I think most cloud services have an inherent security design, and it’s typically better than what most organizations can do on their own.”

But companies still need to have their own security, he adds. It’s important to understand that “cloud” can be Infrastructure as a Service (IaaS), Platform as a Service (PaaS) or Software as a Service (SaaS) and that each level carries different responsibilities for both the consumer and the cloud provider is the first step.

Provider

Customer

IaaS

Least responsibility

Most responsibility

PaaS

Some responsibility

SaaS

Most responsibility

Least responsibility

Even at the SaaS level, where more of the security burden is on the cloud provider, customers still have responsibilities, such as making sure the right people have access and correct permissions, says Wojcinski.

Companies should also be assessing their security and running tests to make sure both their cloud and on-prem environments are actually secure and that the configuration put in place does what it’s supposed to do. By running tests companies can gain a chance to proactively fix problems before cybercriminals or hackers get in and do real damage.

This incident put a whole new light on the recoverability aspect of security.

—Tom Wojcinski

The Future

One of the most remarkable things about the CrowdStrike incident is how surprising it was to security professionals. It simply wasn’t a scenario anyone had ever imagined.

Plenty of organizations have disaster recovery plans but few, if any, considered their endpoint security software causing an outage or included what might happen if they couldn’t get their BitLocker keys. “Now we know,” says Wojcinski.

Companies need to begin by understanding their own responsibilities as well as that of their cloud providers. Once that is established, they need to have a way to validate that their providers and partners are doing what they say they are doing.

Once an organization has a security plan that includes their responsibilities, they need to make sure their operations are resistant to attack, but also that applications are available to users when and where they need it. Next, they need a recovery plan in case of disaster or attack that answers the question, “How can I get back online and continue my mission?”

Tom Wojcinski

Wipfli

Tom Wojcinski is a partner in Wipfli’s cybersecurity and technology management practice. He specializes in helping organizations reduce and manage the risks to modern technology and information systems. In today’s business environment, customers, trading partners, regulators and employees expect continuously available and secure information systems. To help meet this expectation, Tom works with clients to increase the security, availability and recoverability of their information assets.

Tom leads a variety of engagements designed to help improve organizations’ cybersecurity posture, including cybersecurity risk assessment, control program development and implementation, incident response planning and simulation, vulnerability and penetration testing, security audit, control verification and managed security services. He is a frequent author and speaker on cybersecurity and IT risk management topics.

Share this article