On Friday an update from CrowdStrike, malware detection software, took down Microsoft workstation and servers across the world. Many Microsoft computers end up showing the critical error screen also known as the Blue Screen of Death (BSOD). Rebooting did not fix the issue as on reboot the BSOD screen would be shown again.
CrowdStrike stated that the issue was an update that contained problematic code. It does not appear that this was a breach or the result of an attack. Microsoft and CrowdStrike released information on how to fix the issue. It involved repeated reboots or deleting a certain file.
The fix to the crashing generally took a short time to implement on a single machine, but the results of the crashes are why the issues lingered. Often data or files are corrupted and then this corruption has to be fixed before the programs will operate normally.
It appears that this outage was caused by a mistake not a malicious attack. Although it highlighted the vulnerabilities that could be exploited by a malicious attacker. Antivirus clients have increased in scope from identifying malware from a list of known malware to looking at the behavior of running software and identifying if policies are being followed. This new multi-purpose anti-malware software are referred to as end points. Endpoints have to run with privileges on a machine since they check for and stop malicious software, so if an attacker was able to add malicious software to it, it would have the ability to really affect the machine.
Although CrowdStrike only started in 2011, it has grown quickly, employing thousands and booking billions in revenue. A major source of this growth has been CrowdStrike Falcon client which combined antivirus and endpoint detection and response (EDR). The client also is much less resource intensive than many of the other endpoint protections. This is because the software relies more on behavioral analysis than signature based lookup. The large footprint of this software client was why its issues had such a large impact on so many companies.
Businesses were affected across the world and many sectors of the economy from airlines, to financial institutions. The effects of the outage are still being felt and will continue to be felt for a while.
One of the take aways from this issue is that businesses should test releases on a small subset of less critical infrastructure before rolling it out company wide. The speed of security releases needs to be measured against the risk of damage that these updates could create.