×

How the Emergence of Artificial Intelligence Will Affect Cybersecurity

The potential threats posed by cybercriminals remain on the rise. As a result, cybersecurity has become a major focus of businesses, government entities, and individuals alike. With so much valuable data at risk of being stolen or misused, organizations are looking for innovative solutions to protect themselves from cyber threats.  According to IBM’s, “Cost of a Data Breach 2022” report it takes a security team an average of 277 days to identify and contain a breach. Artificial intelligence (AI) has emerged as one of the most promising solutions for enhancing cybersecurity.

The introduction of Artificial Intelligence (AI) into the world of cybersecurity should be considered a game-changer. AI can help organizations protect themselves from cyber threats and also aid in incident response and investigation. AI will enable businesses to become more efficient and cost-effective by helping them quickly identify suspicious behavior, investigate security incidents faster, and accurately identify attackers.

First of all, AI will be used to detect malicious activity faster than ever before. AI-powered security systems will use sophisticated algorithms to quickly detect potential threats and malicious activity, giving companies time to act before a breach occurs. With the right system in place, companies can identify an attack almost as soon as it starts and act accordingly to stop it before any significant damage is done.

Second, AI will help improve incident response time. Currently, when a cyberattack takes place, organizations often have difficulty responding quickly because they must manually investigate each event individually. However, with the help of AI, they will be able to automatically investigate each incident at lightning speed without human involvement. This will enable them to react quickly and effectively if an attack does occur.

Thirdly, AI-powered cybersecurity solutions can provide more accurate identification of attackers. AI can learn how different attacks work over time and use that knowledge to more accurately detect when a new attack is launched against a system. It can then trace back its source to identify the attacker and their intent more accurately than ever before. This information can then be used for future protection strategies against similar attacks in the future.

Finally, AI-driven solutions are already being deployed in areas such as network security analytics which allows for the examination of network traffic patterns in real-time so malicious activity can be spotted much quicker than humans would normally be able to do so manually. These solutions are becoming increasingly advanced as they incorporate machine learning techniques which allow them to adjust their scanning criteria over time as new threats emerge making them even better at identifying possible threats before they even have a chance to do harm.

All in all, AI will bring unprecedented levels of efficiency and accuracy into cybersecurity operations as it becomes increasingly deployed across multiple industries worldwide. With its ability to scan huge volumes of data quickly while providing deep insight into network activity and possible attackers’ intentions, companies are poised to make greater gains than ever before against the threat landscape with minimal effort required on their part. This makes it imperative for businesses today to take steps towards implementing these innovative solutions if they wish to stay secure now and in the future.

The Cyber Compliance Market

Recently, someone asked me to quantify the federal cyber market. 

FedRAMP is now a law that underlines the Government’s Cloud first mandate. After years of ambiguity and excessive costs to become FedRAMP certified to demonstrate data protection controls based on agency’s needs, the law now sets a level playing field for mid-size service enterprises who want to tap into the Federal market. The new law puts a system of reciprocity which allows federal agencies to certify vendors and have the same level of data protection more easily. 

While this law is appealing, the certification rules have not changed. Readiness is still a mountain to climb even with an understanding of the intent of NIST 800-53 controls and the applicability to the service provider’s environment. While the NIST requirements are complex, the cloud security architects and DevOps ability to design and implement the service within an approved boundary with appropriate data controls is no small feat. The demand for these cloud security professionals is very high.

Once you are FedRAMP certified, the burden to provide continuous monitoring reports that include reporting on incidents, security events, and scan for vulnerabilities while ensuring the new product features don’t cause “significant change” is an ongoing program.  

“Let us do the numbers” from my favorite NPR show Marketplace by Kai Ryssdal.

  • While 2022 saw the federal government spend over $11B in cloud technologies, the new bill signed in Dec. 2022 increases the spending
  • The Federal market is a long-term revenue stream with a market of 440 agencies          
  • Government agencies in 10 states have adopted FedRAMP and renamed it StateRAMP
  • FedRAMP is the security gate that will open the gates to these agencies
  • FedRAMP requires validation from a pool of 40 3PAOs
  • The lack of cloud security and application security professionals will further strain service providers ability to get certified quickly

The numbers are interesting but, where do you start?

  • Does your compliance team or security team understand NIST security framework?
  • Is your commercial cloud deployment aligned to security benchmarks or regulations?
  • Don’t let 1000 controls of NIST intimidate you. These are common sense cyber hygiene controls that are broken into domains that your information security probably has implemented
  • 3PAOs can offer guidance, but your FedRAMP readiness team should have cloud security engineers who can map current security tools and processes to NIST requirements
  • While AWS, GCP and Azure offer “FedRAMP Ready” GovCloud, see if it makes sense to implement your cloud software in the GovCloud and continuously monitor it
  • This is not a security tool game or FedRAMP ready “blueprint” but an assessment of your security controls and process to meet a slightly higher security requirement

There is a small battalion of certified assessors who can provide guidance and certification. The shortage of certified auditors is increasing timelines as many of us are now getting ready for CMMC, a DoD mandate, that impacts 300,000+ DoD subcontractors in 2023.

FedRAMP Authorization Act

https://www.linkedin.com/pulse/fedramp-authorization-act-securitybricks-inc

FedRAMP provides a standardized approach to security authorizations for Cloud Service Offerings within the Federal ecosystem and is a crucial cybersecurity certification that cloud service providers must obtain prior to working with U.S. government data. Gaining this certification in advance means placement in the FedRAMP marketplace, from which government divisions and agencies can choose a provider at the level of security they choose.

Cloud Service providers have a multi-billion dollar federal market to address with some clarity on security requirements.

President Joe Biden has signed a legislation that will reform the Federal Risk and Authorization Management Program (FedRAMP), a cybersecurity authorization program, as part of the National Defense Authorization Act (NDAA). The act is designed to promote the federal implementation of FedRAMP government wide.

The latest iteration of FedRAMP Authorization Act makes sure that the FedRAMP program has a board to continue improving the quality and shorten the time for a Cloud Service Provider (CSP) to attain an Authorization to Operate (ATO). The act also creates a new cloud advisory committee consisting of five representatives from cloud service companies with the specification that two of those positions will be filled by small cloud vendors.

Why is this important to cloud service providers (CSPs)?

There are hundreds if not thousands of cloud service providers who need to be FedRAMP certified and the journey for many has been long with millions of dollars in investment.  The old rules made it difficult to cross sell to federal agencies as each agency can have additional security requirements extending the sales process.

One of the most significant aspects of the FedRAMP reform language is a “presumption of adequacy” clause, which would allow FedRAMP-authorized tools to be used by any federal agency without additional cost, or time increasing CSPs market size.

The cloud advisory board will have voice of the CSPs’ making the rules relevant and effective for continuous compliance and ensure highest level of data protection. The shortage of 3PAOs increased the assessment timelines and the single assessment approach will free up 3PAO’s to get more CSPs certified.

If a CSP wants to make a business case to pursue the Federal market, they can start with the NIST controls benchmark with approved FedRAMP services from GCP, AWS and Azure GovCloud instances.  Once they complete their readiness and demonstrate compliance to NIST 800-53 controls, they can now find a 3PAO to validate the controls and submit the package to FedRAMP board for approval.  Once approved, they are listed, and every Federal agency can now subscribe to the service. A FedRAMP certified CSP has demonstrated highest security control implementation and monitoring eliminating the need to chase less known commercial security certifications. FedRAMP has now millions of dollars in funding to market its program to State agencies.  Many states are adopting FedRAMP as their security framework and this only increases the addressable market.   

Tablet Command Partners with Credio, Inc. to Strengthen Cybersecurity Profile

San Rafael, CA, October 26, 2021—

Tablet Command is pleased to announce a partnership with cybersecurity advisory firm Credio, Inc. to ensure the continuous data protection of all Tablet Command systems and services.

Reports of several recent cyberattacks such as the Colonial Pipeline shutdown reveal that it’s more important than ever to ensure systems that impact public welfare are safe from hackers trying to disrupt infrastructure. Tablet Command’s software is saving lives in the middle of a global pandemic, and this partnership with Credio, Inc. safeguards their ability to do so securely.

“Tablet Command elected to use Credio; an independent Cybersecurity partner, in order to ensure objective assessments of our systems and services, ” said William Pigeon, Tablet Command CTO. “We are committed to avoid any internal bias based on the fact that we created these systems. Credio was selected after an exhaustive search, and the service they have provided has exceeded our expectations in every way.”

“The increase in cloud adoption and remote work since the Covid-19 pandemic has resulted in a dramatic increase in cyber-attacks. Credio, Inc. helps enterprises of all sizes implement relevant security and privacy controls to protect their digital assets. “We are humbled by the opportunity provided to us by Tablet Command, and Credio is honored to be able to make an impact in public welfare services.”, added Raj Raghavan, CEO of Credio, Inc.

About Tablet Command

Tablet Command provides a best-in-class emergency incident response and management solution to approximately 200 public safety agencies across the United States and Canada. The software delivers increased margins of safety for emergency responders on the ground by providing a complete picture of the scene and tracking more precise information. Tablet Command also creates operational performance data as a byproduct of the incident management process. This data and the operational improvements that can stem from it has never before existed in the public safety sector. For more information, please visit www.tabletcommand.com

About Credio, Inc.

Credio, Inc. helps its clients successfully gain cloud adoption by balancing security and compliance with digital experience. An ISO17020 accredited security advisory firm focused on Cloud security posture management (CSPM) and compliance, Credio helps clients build secure cloud environments with a team of industry experts that includes military veterans. To learn more visit www.crediopartners.com.

Alicia Perez
Credio, Inc
+1 888-682-9616
alicia.perez@crediopartners.com

Press Release: Credio Launches Managed Application Security Service

SAN FRANCISCO, CALIFORNIA, UNITED STATES, September 20, 2021
/EINPresswire.com/ —

Credio, Inc. is pleased to announce the launch of its Managed Application Security service. Driven by customer demand, Credio’s subscription platform enables customers to perform a source code security scan directly from their code repositories. Offered as a white glove service, Credio’s platform provides a combination of automated tools backed by security experts to help remediate vulnerabilities within the SSDLC process.

Credio’s Managed Application Security service is targeted to millions of developers who use open-source code bases, libraries, container, and Kubernetes applications. A recent survey noted that over 84% of commercial applications have some sort of open-source component.

This managed service is an extension to Credio’s secure code training and integration of security programs within DevOps teams.

“In response to recent software supply chain attacks, increased ransomware attacks and new contracts from our customers to help detect security vulnerabilities within developer IDE, we are excited to join the recent initiative led by Microsoft, Google, AWS in building secure software supply chains. Our managed service is another commitment to address the growing shortage of cybersecurity expertise” – Raj Raghavan, CEO of Credio, Inc.

The platform provides

· OnDemand source code scanning directly from code repositories · API Integration to CI/CD pipeline and JIRA for ticketing · Knowledge base of open-source vulnerabilities · Advisory services for threat analysis and remediation assistance · Secure code training on fundamentals of code security

About Credio, Inc.

Credio, Inc. helps its clients successfully gain cloud adoption by balancing security and compliance with digital experience. An ISO17020 accredited security advisory firm focused on Cloud security posture management (CSPM) and compliance, Credio, Inc. is helping clients build secure cloud environments with a team of industry experts that includes military veterans. To learn more visit us at www.crediopartners.com.

Alicia Perez
Credio, Inc
+1 888-682-9616
alicia.perez@crediopartners.com

Data Privacy Day: What Will Privacy Look Like Under a Biden Presidency?

What Will Privacy Look Like Under a Biden Presidency?

On January 28, it’s Data Privacy Day, where we all get to spend the day thinking critically about the importance of protecting our personal data online. Did you know that the reason Data Privacy Day falls on the 28th is because the Convention for the Protection of Individuals with regard to Automatic processing of Personal Data was opened for signature by the Council of Europe on this day in 1981? On January 20, 2020, Joe Biden was sworn in as America’s 46th President, so we thought it would be fitting to take a deep dive into how the new Biden Presidency might approach privacy over the next few years.

Will Privacy be a Priority?

In truth, Biden has been rather light on details when it comes to specifics around data privacy. There are a few signals however, that Biden may be a positive influence for advancing stronger privacy and data security protections. On the record: Biden stated in January 2020 that the U.S. should be “setting standards not unlike the Europeans are doing relative to privacy.” In the same interview, he also suggested that any Supreme Court nominees should have a strong recognition of the right of privacy. Foreign Policy: Biden’s Foreign Policy Plan laid out a vision for advancing the “security, prosperity, and values of the United States” by renewing alliances, strengthening our own democratic principles at home, and ensuring a level playing field in trade. This includes bolstering protections for data privacy, and ensuring adequate protections against cyber theft. Domestic Policy: Biden’s plans specifically call out the importance of considering diverse stakeholders when it comes to data protection. For example, Biden promises to take account of the “needs of the disability community when strengthening and enforcing data privacy protections,” and to ensure that adequate privacy protections are enforced when collecting data on LGBTQ+ people. A Biden-Sanders Unity Task Force issued recommendations in August which also cited the need to develop best practices around preventing student data sharing by for-profit organizations, curbing civil rights and personal privacy abuses around police use of body cameras, and setting guidelines regarding the use of biometric surveillance and information sharing at the border. It’s noteworthy that with regard to the reforms around immigration, the Biden-Sanders recommendations outline five of the seven GDPR principles — transparency, accuracy, accountability, fit for purpose, and timely. While Biden’s technocratic approaches often favor more data collection, it’s helpful to note that in most cases, sentences on data collection are followed by the importance of disaggregation of data, transparency and accuracy, to ensure privacy is maintained. Big Tech: Biden has emphasized the importance of reigning in Big Tech, by signaling that he plans to pursue antitrust actions and potentially repeal or reform Sec. 230 of the Communications Decency Act, which gives broad immunity to online platforms for content posted by users. He has called out privacy concerns and excessive data collection by firms such as Facebook, Google and others as one of the reasons that Big Tech needs another look. Biden Appointments: Biden is also surrounding himself with experts in privacy, tech and AI from the Obama administration, including:

  • Christopher Hoff (U.S. Department of Commerce) – Hoff will serve as deputy assistant secretary for services at the U.S. Department of Commerce, overseeing the U.S. Privacy Shield negotiations with the EU. He has an extensive privacy background, and has had a long career in the public and private sector in privacy matters. [IAPP Profile]
  • Robert Silvers (U.S. Department of Homeland Security Cybersecurity and Infrastructure Security Agency) – Silvers is expected to be appointed to lead the U.S. Department of Homeland Security’s Cybersecurity and Infrastructure Security Agency, a position formerly held by Christopher Krebs. Silvers is currently a partner at Paul Hastings, and is the vice chair of the firm’s privacy and cybersecurity practice. [Paul Hastings bio]
  • Alondra Nelson (OSTP Deputy Director) – Nelson is a professor at the Institute for Advanced Study, who studies societal impacts of emerging technologies, including AI and algorithmic impacts on bias, data privacy and corporate influence on research. [Wikipedia]

The VP, Kamala Harris, also has hands-on experience pushing privacy and consumer protections, both during her tenure as California’s AG, and in Congress.

Will There be a National Privacy Law?

The US’ byzantine system of patchwork, sectoral federal and state laws has made privacy compliance tough for business. Currently, all states have mostly sectoral (e.g., medical privacy, social security protections), laws on the books, but more states are looking to follow the lead of states like California and Maine in crafting broader legislation. All 50 states, plus the District of Columbia, Guam, and Puerto Rico also have data breach notice laws in place. It’s no secret that big tech firms are heavy political donors, and that compliance with dozens of disparate laws is far more costly than compliance with wholesale approaches like the GDPR. Privacy is also one of the few issue areas where bipartisan support is possible (albeit via very different means). That raises the question – will Congress push for a new federal Privacy Act? While many have speculated for years that a national privacy law is ‘on the horizon’, at best, we can offer only hopeful optimism.

Cross-Border Data Protection

In July 2020, the Court of Justice of the European Union (CJEU) invalidated the U.S. Privacy Shield, a mechanism used by many US firms to transfer data between the EU and US. In the case of Irish Data Protection Commissioner v. Facebook, Schrems, et al. (Schrems II), the CJEU found that the US’ broad surveillance powers, lack of notice to affected EU data subjects, and virtually no right of redress, meant that the US law did not meet the level of data protections necessary to meet adequacy requirements under the GDPR. This ruling has the potential to nullify countless numbers of cross-border transfers for organizations large and small. Despite the Court’s broad declaration of invalidity, that hasn’t stopped the EU and US from trying to work things out. Currently, this task is undertaken by the Deputy Undersecretary for the Department of Commerce, and the recent appointment of Hoff signals that such talks may be top of mind for the administration. That said, it’s highly unlikely that the US will reform broad surveillance powers currently granted to the three-letter agencies, so the likelihood of meeting the spirit of the GDPR’s broad data protection obligations seems unlikely.

Just When You Thought it was Bad Enough: The SolarWinds Attack

This year has been … a wild ride to say the least. 2020 has packed more in its yearly trip around the sun than some decades. First, there were the fires in Australia, Brazil, and California. Then came March, and the collective realization that things were never going to be what they were, even after the pandemic. Oh, and there was a presidential election that left everyone on edge, ongoing racial, economic, and political turmoil, and even a Brexit deal (of sorts). In short, we’ve all seen some things.

But this crazy year still had a bit more crazy to give us, and so on December 13, FireEye disclosed one of the largest, most sophisticated global intrusion & espionage campaigns in modern history, the SolarWinds supply chain attack. The compromise, which has been initially attributed to APT 29 (Cozy Bear), Russia’s foreign intelligence service, has affected at least 200 organizations directly (and potentially affected thousands more) around the world. Details are still being uncovered by the day.

A Quick Overview of the Attack

On December 13, FireEye disclosed that it had been the victim of a supply chain attack via the SolarWinds Orion platform, used to monitor and manage IT health. Attackers used digitally-signed certificates issued from the SolarWinds website to install an infected update package masquerading as a legitimate Orion software update. Once the payload was installed, communication with third-party servers was established allowing for remote access by the attackers. Then the payload removed itself and restored legitimate update files. With remote access, the attackers were able to gain additional credentials and move laterally throughout the network against specific targets. Current timelines project that the attack has been ongoing since at least March 2020, with the initial exploit going back to OctoberNovember 2019.

SolarWinds Malware Infection Chain — Microsoft Defender Research Team

The initial disclosure noted that one of the payloads, SUNBURST, had been used to conduct espionage against victim sites, and leveraged multiple sophisticated techniques to evade detection, obscure activity, and maintain persistence. One of the more clever aspects was the use of local IPs and dynamically-generated hostnames that match the victim’s environment, making the attack even more difficult to detect. There’s also potentially a second attack vector, known as SUPERNOVA that is still being investigated, but may be piggybacking on the SUNBURST vulnerability.

The attack’s complexity and many-pronged approach is complicated, highly technical, and worth a deeper dive. We’ve compiled a list of great resources to read over to better understand how the attack works (and what mitigations can be taken).

Why Supply Chain Attacks Are Spreading

We’ve talked before about the risk of supply chain attacks. Senior Consultant Carey Lening has given a talk about the growth of supply chain attacks across numerous industries, including finance, the maritime sector and industries.

What makes these attacks so challenging, is that organizations have limited control over the security posture of downstream providers. Even a Zero Trust security model is unlikely to have stopped the SolarWinds attack, as the tool itself had privileged access to enterprise servers. And despite what opportunistic vendors may be claiming, no single tool or service can prevent this.

Unfortunately, the best solutions to mitigate against future SolarWinds-style attacks tend to require buy-in from the top, both in terms of cost and resources, but also a willingness to fundamentally change how security is practiced internally. In short, a defense-in-depth, mature security model that emphasizes:

  • Thorough network and device hardening, as well as adherence to baseline best practices for security;
  • Comprehensive visibility of system and network activities;
  • Regularly sharing and updating threat data across industries, domains and tools;
  • Timely review and actioning of relevant threat indicators, including temporal analysis of compromised devices to understand lateral movements;
  • Isolation and prompt investigation of machines where known-bad file signatures have been detected;
  • Identification of compromised (or likely compromised) accounts.

Additionally, standards bodies, government regulators, and big industry players (looking at you, Microsoft, Amazon, Google, Apple, etc.), also need to step up and begin to enforce industry-wide changes. As the Atlantic Council notes in their detailed report on supply chain attacks, ‘Breaking trust: Shades of crisis across an insecure software supply chain’, support for robust, widely-compatible secure standards and code practice is paramount. Improving open source libraries is also another critical component that will take a village.

Finally, there should also be an emphasis on holding vendors and third party providers to account for their own security practices (or lack thereof). While there’s no such thing as perfect security, in the case of SolarWinds, security was … not exactly a priority. By rewarding firms with dollars for lackluster security practice, it sends a message that security isn’t a critical concern, and increases the attack surface.

In short, we’re all in this together, and we need to start acting accordingly.

My Brand! The Rise of the Elevated Spoof.

At Credio, we’ve written before about how COVID-19 is having an outsized impact on everything in daily life, including cybersecurity and privacy threats.

As users have been forced to leave the confines of hardened on-prem networks and turn to cloud and other hosted services, organizations have faced greater challenges, often with fewer resources at hand.

The Rise of the Elevated Spoof

For this week’s issue, we wanted to delve into one growing area of concern for organizations -the rise of domain spoofing and improved phishing techniques.

According to a recent report by F5 Labs, “55% of phishing sites made use of target brand names and identities in their URLs.”

Gone are the days of the poorly-written, dodgy sites that boasted exaggerated urgency and laughable spelling mistakes. Now criminals are going for sites that genuinely look and act like their targets — right down to the domain names.

More TLDs = More Opportunities to Wreak Havoc

So how are fraudsters doing it? It turns out, there are a number of techniques available.

Domain Name Spoofing: Fraudsters are learning that, thanks to the wonders of hundreds of top level domain names, it’s still easy to register a deceptively-similar looking domain name, clone a target’s login page, and blast out a link to the user.

For example, let’s say you’re interested in mimicking the domain name for apple.com. Obviously, Apple has already registered all the apple.* domains that most of us can think of. But as more top-level domains and ccTLDs (country-code TLDs) come online, it becomes a game of whack-a-mole to keep up.

Compounding this is the rise of free and low-cost domain registrars such as Freenom and DotTK, which provide inexpensive (and sometimes free) domain registrations, even of popular domains.

Unicode and IDNs: Add to that, the problems that come from our multilingual world — and the Unicode standard. While Unicode is a great equalizing force by opening up opportunities for non-Western speakers to be heard — providing text for most of the world’s writing systems has a cost — it gives criminals a bigger sandbox to play in.

IDNs, or Internationalized Domain Names — use the power of the Unicode standard to allow organizations to connect online in local languages. IDN registrars are still few and far between, but some will allow fairly convincing-looking registrations — for example applе (the Cyrillic capital letter Ie, in lowercase).

Punycode: But say you’re a scammer, and committed to keeping the domain in the .com TLD. Now, GoDaddy won’t accept non-ASCII unicode character sets, so your plans for applе.com likely won’t fly. Enter punycode.

Punycode is a way of converting letters that cannot be written in ASCII into Unicode ASCII encoding. Using punycode, you can include non-ASCII characters within a domain name by generating a “bootstring” encoding of unicode. Here’s the punycode for applе.com – xn--appl-y4d.com (which can be registered for around $11).

On certain vulnerable browsers (and especially mobile devices, where eyeballs are contending with smaller screens, shrunken urls, and the inability to hover a mouse), it renders the page as applе.com, which looks surprisingly legitimate, especially if you’re clicking on a link and might be distracted, or haven’t had your first cup of coffee. Throw in a free Letsencrypt TLS certificate, and it’s a very convincing-looking fraud opportunity.

Site Cloning: Even the practice of cloning a website can often be trivially easy. One need only visit the target’s website, save the page, and extract the HTML, CSS, images and other elements, and upload that content to a hosting site. With a few alterations, a fraudster can make a rather convincing-looking site at https://applе.comthat might confuse even the most skeptical among us.

While phishing attacks will only continue to improve so long as there are victims to be had, it’s important that awareness, security controls, and the tools we use, also continue to evolve with the threat. Our eyes can’t do it alone.

Resilience: Planning for When the Clouds Go Away

Resilience: Planning for When the Clouds Go Away On November 25, AWS suffered a major service disruption in its Northern Virginia (US-East-1) region. It left the region out of commission for over 17 hours, and took thousands of sites and services, including Adobe, Twilio, Autodesk, the New York City Metropolitan Transit Authority, The Washington Post offline. When Planned Upgrades Go Wrong In a post-mortem, Amazon described how a planned capacity increase to their front-end fleet of Kinesis servers led to the servers exceeding the maximum number of threads allowed by the current configuration. As the servers reached their thread limits, fleet information, including membership and shard-map ownership, became corrupted. The front-end servers in turn were generating useless information, which they were propagating to neighboring Kinesis servers, causing a cascade of failures. Additionally, restarting the fleet appears to be a slow and rather painful process, in part because AWS relies on this neighboring servers model to propagate bootstrap information (rather than using an authoritative metadata store). Among other things, this meant that servers had to be brought up in small groups over a period of hours. Kinesis was only fully operational at 10:23pm, some 17 hours after Amazon received the initial alerts. Finally, the failures with Kinesis also took out a number of periphery services, including CloudWatch, Cognito, and the Service Health and Personal Health Dashboards, used to communicate outages to clients. For reasons that aren’t totally clear, these dashboards are dependent on Cognito, and may not be sharded across regions. Essentially, the post-mortem seems to imply that if Cognito goes down for a region, affected customers in that region will have no way of knowing. Resiliency, or How to Survive the Next Outage While Amazon posted a number of lessons learned in their post-mortem, which are all worth reading, today we wanted to discuss what customers can do to limit their own risks in the Cloud:

  • Build Outages into your BCP and IRP: While providing continuous service and support is the main goal, we should all be mindful of worst-case-scenarios. That means identifying critical workloads and applications, considering and having a plan to execute fail-over options, and ensuring that customers can be alerted when a failure can’t be avoided. Build response and recovery plans around these considerations.
  • Housing Critical Workloads in Multiple Availability Zones (AZ): Since the AWS outage appears to have been isolated to a single region, organizations that relied on hosting their critical systems across AZs were less impacted when US-East-1 went down. Consider services like AWS’ Multi-Region Application Architecture to fall over to a backup region. These features are not available by default, however, and must be built into an organizations’ overall architecture plan.
  • Use Amazon Route 53 Region Routing: Another best practice for ensuring geographic distribution is to use Route 53 to route users to infrastructure outside of AWS, whether it’s another cloud provider, or a more minimalist on-prem backup service.
  • Test for Your Worst Case: Adopt Netflix’s ‘Chaos Engineering’ model to test what happens when networks, applications, or infrastructures go down, and develop a road-tested plan for how to work around those failures.