In Cloud We Trust? Not So Much – Implementing an Enterprise Zero Trust Model

If you’re following all the infosec trends as of late, you’ve no doubt heard the phrase ‘zero trust’ bandied about nearly every week for the past few years, usually with glowing promises from vendors that their particular solution provides massive gains in security and lowers risk. This interest has grown beyond the normal marketing hype cycle, as COVID has forced more organizations to adopt at least a hybrid cloud model to deal with the growth in BYOD, WFH, and insider threat.

What Do We Mean When We Talk About Zero Trust?

So, what is zero trust, and what might a native cloud implementation of zero trust look like?

First, it’s good to start with the basics. Zero Trust Architecture (ZTA) is a design philosophy where everything is presumed insecure (and potentially malicious). Accordingly, anything touching or accessing the network must first be validated. That means each application, endpoint, or component is treated as an untrusted microservice. It’s a big shift from the secured perimeter model, which assumes that things outside the network are the main threats and prioritizes strong access controls, the use of VPNs and firewalls to harden network defenses.

A ZTA approach encompasses everything from identity and access management, credentialing, endpoints, device management, data security, and operational processes, such as context-aware policies. One central aspect to ZTA is that access is granted via a decisioning and enforcement process known as Policy Decision Point (PDP), which consists of two subprocesses: a Policy Engine (PE) and Policy Administrator (PA). The PA is responsible for establishing or ending communication between a subject and a given resource, while the PE handles the ultimate decisioning on whether or not to grant access. A separate process, known as the Policy Enforcement Point (PEP) communicates with the PA to apply a confidence rating based on a number of factors, including location, time of day, and previous access attempts. That confidence rating in turn, defines the level of trust (“implicit trust zone”) given to a subject. The PDP/PEP applies a set of controls so that all traffic beyond the PEP has a common level of trust.

The National Institute for Standards and Technology published an excellent guidance document on setting up ZTA for enterprise environments that is definitely worth a read. The NIST ZTA document helpfully analogizes the PDP/PEP process to passenger screening at an airport:

The “implicit trust zone” represents an area where all the entities are trusted to at least the level of the last PDP/PEP gateway. For example, consider the passenger screening model in an airport. All passengers pass through the airport security checkpoint (PDP/PEP) to access the boarding gates. The passengers, airport employees, aircraft crew, etc., mill about in the terminal area, and all the individuals are considered trusted. In this model, the implicit trust zone is the boarding area.

Image from NIST SP 800-207

That said, moving to a zero trust approach is not without risk or tradeoffs. The most distinct threat to ZTA environments is of course, compromise to the decision or enforcement processes. Misconfiguration of the PE rules, or a compromised PA could either allow unauthorized access, or deny legitimate access requests via DoS attacks or network disruption. Similarly, a ZTA environment is not wholly immune to phishing, insider threat, or other account compromises. Finally, undertaking a radical redshift in IT architecture often incurs significant resources and costs.

Implementing Zero Trust Natively in Cloud Environments

All of the major providers offer at least some degree of zero trust implementation. Usually, its applicable not only within the cloud environment, but to other systems and services as well.

AWS: As noted in their guidance documentation, AWS offers numerous tools and strategies as part of the AWS Well-Architected Framework. Using native tools like Amazon’s Elastic Load Balancing (ELB)/Application Load Balancer (ALB), DNS domain management through Amazon Route 53, encryption via AWS Key Management Service (AWS KMS), and edge security and caching through Amazon CloudFront, allows customers to create an effective Zero Trust implementation and adhere to AWS’ five pillars:

  • Security
  • Operational Excellence
  • Performance Efficiency
  • Cost Optimization
  • Reliability

Here’s a visual example of their AWS Well-Architected Framework as applied to a web hosting provider:

Additional guidance can be found here and here.

GCP: The GCP-native solution for ZTA is based on Google’s well-known BeyondCorp offering. BeyondCorp builds directly on Google’s own in-house zero trust network model, along with industry best practices. BeyondCorp is based on three principles:

  • Connecting from a particular network must not determine which services you can access
  • Access to services is granted based on what we know about you and your device
  • All access to services must be authenticated, authorized, and encrypted

For its decisioning, BeyondCorp relies heavily on ‘context-aware access’ — an approach that uses device, access and other rules and conditions to grant access on a user or device level.

Here’s a view of Google’s Context-Aware approach:

Additional guidance can be found here.

Azure: Finally, Microsoft’s offering relies heavily on app detection and control through its Microsoft Cloud App Security solution. While there’s a good deal of monitoring and identification (including ‘Shadow IT’ application detection), and more granular app control, the Azure solution seems less developed compared to those offered by GCP or AWS.

Additional guidance can be found here.

Don’t Be Like Uber — Have Your Data Wear a Mask

For this week’s blog article, I want to start with a story.

Imagine you’re the CISO of an up-and-coming ride-hailing service. The business has been so successful, that you’ve been able to scale at a record pace. And while your developers and QA teams are now knowledgeable on secure coding practices, this wasn’t always the case.

Back in the early days, your engineers were using a GitHub repo for some testing code that accidentally was made public. A little Google dorking led hackers to discover that one of your engineers had hardcoded the login credentials to your test database. Unfortunately for you, the test database was using real production data, which included thousands of records on your customers.

But all was not lost, because your company, unlike Uber, was smart, and had recently implemented a data masking policy to protect that sensitive user data in that test database.

What is Data Masking?

So what is data masking anyway? Data masking, or obfuscation refers to techniques that can be used to hide or de-identify data, usually by substituting it with some form of modified content. In its simplest form, think of the humble password prompt:

Here, the password text is ‘masked’ with bullets. The key is to keep the data format similar, without exposing the values. While the password example is a simple one, data masking is also commonly used between production and development environments, or as a way of sharing data with third parties such as call-center personnel.

For example, during the software development and testing phases, developers and QA team members may need to use real data for testing and debugging applications. Rather than copy live production data into the development environment, data masking can be used to allow for testing formats without the risk of exposing sensitive personal data. There are a number of data masking techniques, including fictive data or substitution (e.g., ‘John Doe’, ‘Anytown, USA’, ‘123-45-6789), full or partial redaction (‘XXXXXX’), encryption, or using null values.

While data masking can be applied to any field or data element, it’s most commonly used in the context of protecting personal data, such as name, address, phone number, IDs (e.g., SSN, passport number), and payment card details. No matter how it’s used, in order to be effective, your data masking technique must change the live data in such a way that it becomes impossible to reverse engineer the identity or sensitive data.

An Ounce of Prevention Avoids a Pound of Compliance Headaches

In addition to being a general best practice in software development, data masking can also be effective at limiting data breach risk, including third party breaches, as well as offering significant benefits when it comes to meeting compliance objectives under PCI-DSS, the GDPR and the HIPAA Privacy Rule.

PCI-DSS Requirement 3 explicitly instructs merchants to mask primary account number (PAN) data through truncation or masking whenever it is displayed. All but the first six, or last four digits should be masked. And while the GDPR doesn’t specifically mention the idea of ‘data masking,’ Recital 78 does outline how pseudonymization, which data masking is a subset, can be implemented as an effective technical and organizational control for meeting compliance obligations. Thus, when implemented correctly, data masking can be a means to achieve those ends.

Fortunately, there are many solutions that can be used to implement a data masking process, including within cloud environments. Regardless of what application or tool you use, it’s important to look for something that provides the following features:

  • Offers a range of masking techniques
  • Uses rules-based data masking, that can be applied to different categories/subsets of data
  • Utilizes format-preserving encryption (FPE) transformation
  • Uses realistic fictive data types
  • Has centralized policy management
  • Robust access controls
  • Audit trails and compliance tracking
  • The ability to share subsets of data
  • Scalability

Native Solutions for the Cloud

GCP: Google offers a native data masking/de-identification solution via its Cloud Data Loss Prevention(DLP). This fully-managed service identifies, classifies and performs de-identification methods such as masking and tokenization across Cloud Storage, BigQuery, and DataStore, as well as a streaming content API that enables support for other data sources. Key features include:

  • Scanning and classification of over 120 types of sensitive data, or InfoTypes. This includes name, email, cardholder data, gender, IP Address, ID numbers and other types.
  • Automatic data masking capabilities for both structured and unstructured data
  • Detailed findings can be sent to BigQuery for further analysis and auditing
  • Allows organizations to add and manage custom data types
  • Volume-based pricing.

Additional documentation and a full list of features can be found here.

AWS: While Amazon does not appear to have a single or complete solution, they do offer a native tool known as Macie which allows for discovery and protection of sensitive data at scale. According to Macie documentation, “Amazon Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in AWS.” While Macie does not appear to actually perform data masking functions, its identification features do allow for continuous monitoring and response. Other key features include:

  • Automatic inventory of Amazon S3 buckets, including a list of unencrypted, publicly accessible, and shared buckets.
  • Identification and alerting of sensitive data, such as personally identifiable information (PII)
  • Searchable findings in the AWS Management console
  • Integration with Amazon EventBridge (formerly CloudWatch) and AWS Step Function workflows.

Additional documentation can be found here.

Azure: Azure also offers a limited set of native data masking features for SQL databases, through its Azure Portal solution. The Dynamic Data Masking recommendation engine automatically flags fields and data types for masking. Existing logic for masking credit card details, email and other custom text fields are built in.

Other key features include:

  • Implementation through the portal, Azure SQL Database cmdlets or REST API
  • Masking rules are customizable and columns can be added manually
  • Simple drop-down functionality to set masking type (e.g., truncation, custom prefixes, random number generation)

Additional documentation can be found here