DevSecOps methodologies to improve the robustness of your organization’s cloud infrastructure.

As you are most likely already aware, in our modern times digital security becomes ever more important. As we are transforming the way we conduct business, our organizations hold ever more data, and are increasingly liable for protecting that data. In Europe, failing to report a data breach can cost an organization up to 4% of its annual revenue!

A modern cloud infrastructure can be attacked from many angles: subtly via malware in the supply chain, noisy using a DDOS attack on your website or sneaky using credential phising techniques. This is why it is important to foster a culture of security hygiene, so that potential security issues are detected very early on in the process. In DevSecOps terminology this is called “shifting left”.

At Gluo, we researched a set of tools that help enable a classic CI/CD setup to enforce DevSecOps best practices, which help you improve security already at the development phase, instead of fixing security issues in production (potentially impacting operations). This is what is called the “shift left principle”. During the project we focused on security at three different stages: during development, during deployment and in runtime.

Security during development

Traditionally, developers strive “to make things work”. Unfortunately, because of time constraints and business pressure, it happens that security issues go unnoticed. That is why we have investigated a few innovative tools to enforce security in the CI/CD process, so that developers are alerted during the development, and nudged to immediately fix the issue.

Talisman (https://github.com/thoughtworks/talisman) is software that scans for secrets that might end up in a Git repository. If a developer on your team tries to commit a secret (eg database credentials) to the repository, Talisman will prevent this and inform the developer of its findings.

Of course, a developer could try to skip the check, so we chose to do a second scan when a merge request is opened from feature to development branch!

When code is compiled/packaged as a container, we found that Trivy (https://github.com/aquasecurity/trivy) is a very good tool to scan for vulnerable code dependencies. This way a developer gets feedback on the state of security of his/her new container, and can make sure that it is using only secure and up to date code libraries.

Depending on your risk appetite, you can configure Trivy to allow containers with only vulnerabilities of a low severity, or only super secure ones.

As you might already have guessed, (container) workloads are only as secure as the underlying infrastructure, so with the help of Terrascan (https://github.com/tenable/terrascan) it is possible to check for issues with cloud infrastructure, before it is even created! Terrascan is a very nifty tool that detects configuration issues, like insecure firewall policies, publicly accessible storage, and much more.

Security when deploying software

Even if you have a very secure development process, there’s still a slight chance unsecure software might end up in your infrastructure, since no single tool can cover all bases.

That is why it is important to put safeguards on your infrastructure, so that unsecure workloads can be denied from running, even if they managed to pass the security checks during development.

To make this possible, we explored admission controls to make sure all new workloads adhere to the required security standards. You can compare admission control software to a nightclub bouncer; if a workload is (too) unsecure, it just doesn’t get in.

We discovered that Connaisseur (https://sse-secure-systems.github.io/connaisseur/) is making the process to cryptographically verify container images very easy. If every image that is used in an environment is signed, Connaisseur can allow only those workloads that can present a valid signature. This is preventing attackers to spin up roque workloads, should they have found a way to breach the infrastructure in some way.

On top, it’s possible to chain OPA Gatekeeper (https://github.com/open-policy-agent/gatekeeper), which adds functionality not available with Conaisseur. We use it to deny wrongly configured workloads that got through the cracks! Examples of policies that are enforced:

workloads that request root privileges are denied

- The risk is that if the workload gets breached, the underlying host gets exposed too (and thus all other workloads).

workloads that expose a service using HTTP (plain-text) are denied

- The risk is that attackers could perform a MITM (man in the middle) to intercept sensitive network traffic.

workloads that can request unlimited resources (CPU/RAM) are denied

- The risk is that is the workloads gets popped, this opens up the possibility for Denial of Service attacks.

Effectively implementing admission controls on your infrastructure greatly reduces security risks!

Runtime security

Even when you have a super secure build process, and your admission controls are really tight, even then smart (or nation state sponsored) attackers can find vectors to exploit. Therefore, it is very important to continuously monitor security in runtime.

To make sure our production infrastructure is properly configured (and stays that way) we regularly scan the infrastructure using Kubebench (https://github.com/aquasecurity/kube-bench). This is an audit toolkit that uses official CIS benchmarks (https://www.cisecurity.org/cis-benchmarks/), a repository of best practices and standards for cloud security. A Kubebench scan detects wrong configurations that can potentially leave your infrastructure open to attacks. Using the report, it is very easy for a systems administrator to solve security issues with surgical precision.

To try detecting attacks as they happen, we tested Falco (https://falco.org/) to monitor infrastructure on the kernel level. When you monitor kernel activity, it is possible to detect suspicious activity very early on, whether the activity takes place on the host, or in a containerized workload.

In combination with alerting, your incident response team can immediately take appropriate action when Falco detects for instance that an attacker has managed to open a shell inside a critical workload!

At last, a word about workload isolation, since we understand that the reality comes with constraints that do not always allow for the most secure option. In the end, the level of security maintained should be the result of a conscious risk assessment.

If your organization has some workloads that need elevated privileges or make use of older libraries that are still mission critical, that could be solved with an isolated environment for those sensitive workloads. The idea is that if one of those workloads gets breached, the spillover should be limited to that isolated environment.

Conclusion

As cloud security is more important than ever, there are many tools and techniques becoming available to audit, enforce and ensure that your cloud infrastructure adheres to the highest security standards. With the help of a specialized partner like Gluo, you can benefit from this fast-paced evolution without needing your development team to become security ninja’s!