Devops 읽어볼 정보
The objective is to document findings and recommendations from the review of DevOps processes and artifacts.
Table of Contents
Istio Recommendations
options for Istio on GKE are: GKE with Istio addon, OSS Istio, CSM(Cloud Service Mesh) in the near future.
Telemetry with Stackdriver
This example demonstrates the ways you can use Stackdriver to gain insights into and debug microservices deployments running on GKE with Istio’s telemetry support: monitoring, tracing and logging. It is by default and configurable via Rule in GKE with Istio addon. It can be enabled with OSS Istio as well.
Telemetry with Kiali
Installing Istio on a GKE cluster shows how to install a version of istio on a GKE cluster. Enable the stackdriver metrics if desired. Furthermore, prometheus, grafana, tracing withJaeger and kiali can be installed. Service toplogy with kiali provides answers to the question: What microservices are part of my Istio service mesh and how are they connected?
Canary Deployment
This example demonstrates how to use VirtualService and DestinationRule to implement carnary deployment. Furthermore, Istio by example shows lots of use cases for traffic managment & security.
Mesh-wide security options
Strict mTLS mode is recommended for Istio on GKE. Otherwise for the Permissive mTLS(MTLS_PERMISSIVE) mode, introduction of Istio shows an example incrementally adopting Istio mutual TLS authentication across the service mesh. Configure mTLS with VirtualService and Policy for both sides of TLS.
Network security logging
- Palo Alto Networks Security adapter for Istio
- full visibility into container activity from a network perspective for container workloads running on Kubernetes environments with Istio.
- Ability to use the logs and information to perform forensics to understand exploits and vulnerabilities.
Managed Istio CSM(Cloud Service Mesh)
Istio plays a key role in Anthos’s CSM. This is still in alpha and it would be nice to be aware of the longer term offering from GCP on service mesh, then plan accordingly. Microservices architectures present a range of benefits, but they introduce many challenges. Google Cloud Service Mesh provides a fully managed platform that simplifies operating services across the board, from traffic management and mesh telemetry to securing communications between services, thereby taking a significant burden off your operations and development teams. For example, Traffic Director is a Google managed Pilot.
For a demo, please check out Understanding SLOs and Error Budgets With Istio (Cloud Next ‘19)
istio-operator for OSS Istio
If you choose to use OSS Istio, Istio operator can be a great aid to automate and simplify these and enable popular service mesh use cases (multi cluster federation, canary releases, resource reconciliation, etc) by introducing easy higher level abstractions. It can be enabled to send metrics to the Stackdriver.
YAML Recommendations
- Can use VSCode as text editor, plus Red Hat’s YAML Support plugin as it is very handy for validation, autocompletion, and formatting.
- An interesting idea to customize the helm chart via Kustomize tool. Kustomize is a project that came out of the CLI Special interest group. Kustomize lets you customize raw, template-free YAML files for multiple purposes, leaving the original YAML untouched and usable as is. This can be leveraged by solving customization of upstream Helm charts without PRs. Customizing Upstream Helm Charts with Kustomize shows pro and con side on using Helm with Kustomize.
Helm Recommendations
Developing chart
Please check out the following official guides
- Charts best practice as it the official best practice for structing charts.
- Tips and Tricks has lots of useful tips for developing charts.
- Chart template developer’s guide
Always use helm create for a new chart because it is always up to date with the recommended practices.
helm lint your chart including custom values. It is a good practice to lint your charts before trying to install them. The linting will apply the templating and verify that the output is a well formatted yaml. A good practice would be to create a ci folder on the same level as your templates one and put there the additional values files you want to verify. So if you have a file called ingress-enabled-values.yaml in your ci folder just run
helm lint --values ci/ingress-enables-values.yaml
.Use dry-run and debug to see what the chart will install, e.g.
helm install stable/postgresql --name standalone --dry-run --debug
helm test:Write funcitonal test according to Chart Tests. It is implemnented via helm lifecycle hooks. For example we can verify the network endpoint or database credentials.
Customizing Upstream Helm Charts with Kustomize shows pro and con side on using Helm with Kustomize.
References
- Helm Chart Patterns - Vic Iglesias, Google is a pretty awesome talk on Helm Charts.
- Charts best practice
- Helm Documentation
Security
https://blog.ropnop.com/attacking-default-installs-of-helm-on-kubernetes/ walks through how an attacker who compromises a running pod could abuse the lack of security controls to completely take over the cluster and become full admin.
Securing your Helm Installation captures the current best practice
how to create strong SSL/TLS connections between Helm and Tiller.
Install Secure Helm In Google Kubernetes Engine (GKE) illustrates with a real example for the best practice above.
Another option is not to use Tiller. For example,how to install istio without Tiller is feasible with
helm template
.
Terraform
Refactor
resource "null_resource" "kubernetes-ready" { triggers { dependency_id = "${var.cluster_api_endpoint}" } }
can be added to terraform-helm-tiller
module so that tiller deployment happens after the GKE cluster is ready. Hence there is no need for callers of the module to check dependency.
Production grade
Deploying a production-grade Helm release on GKE with Terraform from Gruntwork.
Manage GCP resources with Kubernetes CRD
Ever wonder if we can manage GCP resources in Helm charts just like istio, knative etc? Config Connector is a Kubernetes addon that allows you to manage your Google Cloud Platform (GCP) resources through Kubernetes configuration such as CloudSQL, Cloud memory store, GCS and GKE. This enables a consistent tooling for resources. I think it is supplemental to Terraform.
References
- Exploring the Security of Helm
- running-helm-in-production
- Deploying a production-grade Helm release on GKE with Terraform
- Helm Chart Patterns - Vic Iglesias, Google is a pretty awesome talk on Helm Charts.
- Charts best practice
- Helm Documentation
Terraform
Review
Pretty solid design and follow best practices for Terraform in general
- Clean seperation of remote state by application and environment.
- Clean seperation of git repository and folder by application and environment.
- Code reuse with modules on GCP services such as GCP project creation, GKE, CloudSQL, memorystore,filestore etc.
- Don’t Repeat Yourself (DRY) by reusing
terraform-google-modules
git repo as much as possible. null_resource
technique is used for dependency management between modules.- Versions are pinned for various providers and modules.
Recommendations
- Remote state as data can be used in terraform-gcp-projects-admin/environments to refer the parent_folder_id dynamically by adding output variables in terraform-gcp-projects-admin/master-hierarchy for GCP folder ids. Mentioned here as well
- Test as code: Kitchen-Terraform enables verification of Terraform state with InSpec.
- Nice to have some DevSecOps? Terraform Validator can be used to validate terraform plans before they are applied. Validations are ran using Forseti Config Validator. Forseti Config Validator Efforts describes how Terraform validator works with Forseti. Forseti Config Validator in GCP
Network Security
Defense-in-depth strategy is enabled in GCP with a comprehensive portfolio of security controls.
Review
- Deploy your VMs with only private IPs. Even better this can be enforced with an Org policy.
- Access Google managed services and GCP managed services (CloudSQL,MemoryStore,Filestore) privately.
- Provide secure outbound internet connections with Cloud NAT.
Recommendations
- Define an organization policy per project, per folder, or per organization.
- You can define a constraint to restrict virtual machine instances from having an external IP address
- You can restrict the set of identities that are allowed to be used in Cloud Identity and Access Management policies per folder or per project. For example, restrict a folder “engineer” for your domain only users while adding contractor outside under another folder.
- By deploying Google Cloud Armor security policies, you can block malicious or otherwise unwanted traffic at the edge of Google’s network, far upstream from your infrastructure. Use preconfigured WAF rules to protect against the most common application vulnerabilities like Cross-site Scripting (XSS) and SQL injection (SQLi).
- Cloud Identity-Aware Proxy (IAP): you can permit access for authorized users to applications over the internet based on their identity and other contexts without requiring them to connect to a VPN.
- VPC Service Controls: tl;dr Mitigate exfiltration risks by preventing your data from moving outside the boundaries of a trusted perimeter. VPC Service Controls allows you to build a trusted private perimeter and ensure that data access is not allowed outside the boundaries of that perimeter. Similarly, the data can’t move outside of the perimeter boundaries, mitigating exfiltration risks.
- The Web Security Scanner identifies security vulnerabilities in your bGoogle Kubernetes Engine web applications. It crawls your application, following all links within the scope of your starting URLs, and attempts to exercise as many user inputs and event handlers as possible.
- Forseti is used by Spotify to create a notification pipeline that proactively informs us about risky misconfigurations in GCP. Read the story
GKE
Review
- Regional cluster provides the benefits of resilience from single zone failure as well as zero downtime master upgrades, master resize, and reduced downtime from master failures.
- Node auto repair is enabled.
- Node auto upgrade is enabled. Keeping the version of Kubernetes up to date is one of the simplest things you can do to improve your security.
- Use custom node pool by removing the defaut node pool and create new ones with custom machine types.
- Cluster autoscaler and horizontal pod autoscaling are enabled.
- Daily maintenance window is specified.
- Stackdriver Kubernetes Engine Monitoring and Prometheus are used instead of the legacy one.
- Istio is used.
- kubernetes dashboard is disabled.
Recommendations
Here are some relevant best practices on GKE. There are many more can be found in the reference section.
GKE Networking
There are overlapping functionalities between istio and GKE such as ingress and ingress-gateway. The purpose here is to list what GCP offers as managed servcies as an option.
- Google managed TLS certificates When you configure an HTTP(S) load balancer through Ingress, you can configure the load balancer to present up to ten TLS certificates to the client. It is created via a manifest.
- For GKE ingress traffic via HTTP(S) Load Balancer GCP HTTP(S) Load Balancing is implemented at the edge of Google’s network in Google’s points of presence around the world.
- Cloud Armor enforces web application firewall (WAF) policies at the edge How to use a BackendConfig custom resource to configure Google Cloud Armor in Google Kubernetes Engine.
- Cloud CDN similar to Cloud Armor, How to use a backendConfig custom resource is used.
- Container-native load balancing optimizes the HTTP(S) load balancers performance. Container-native load balancing uses a data model called network endpoint groups (NEGs), which are collections of network endpoints represented by IP-port pairs.
- It bypasses the kube_proxy and iptables, this solves the “double hop” problem shown in the Pods are first-class citizens for load balancing
- It reduces the cost for egress traffic across zones in a regional cluster, which priced at 0.01 per GB.
- The NEG data model is required to use Traffic Director, GCP’s fully managed traffic control plane for service mesh.
- Managed Service Mesh: Traffic Director vs Istio
GKE Security
Securing GKE cluster by either whitelisting via MAN(master authorized network) for a public cluster or using a private GKE cluster completely. In the case of private GKE cluster, private Google access is enabled by default so a VPC peering is created between your own VPC which hosts the cluster nodes and Google-owned VPC which hosts the master nodes. For setting up the network path from office and home to the private API endpoint, one option is
kubectl exec
via VPN, kubectl can use the internal-ip based Kubernetes context pointing to the private API endpoint. The other option is to create a private compute instance in the same VPC of the GKE cluster, so the instance can SSH forwarding via Cloud IAP(Identify Aware Proxy) to the private API endpoint. When a user runs SSH from the gcloud command-line tool, SSH traffic is tunneled over a TLS connection to Cloud IAP, which applies any relevant context-aware access policies. If access is allowed, the tunneled SSH traffic is transparently forwarded to the VM instance.Use Least Privilege Service Accounts for your Nodes. By default the compute engine default servic account is used, which has project editor role granted. In the terraform, set the service_account as “create” can create the node service account with least privileges, granting more privileges if needed. Alternativley, Reduce your Node Service Account Scopes if you don’t want to use a custom service account.
One of the key security concerns for running Kubernetes clusters is knowing what container images are running inside each pod and being able to account for their origin. With Binary Authorization, you can ensure that internal processes that safeguard the quality and integrity of your software have successfully completed before an application is deployed to your production environment.GKE Binary Authorization Demo can be a starting point.
Restrict Pod Permissions with a Pod Security Policy. By default, Pods in Kubernetes can operate with capabilities beyond what they require. You should constrain the Pod’s capabilities to only those required for that workload. Kubernetes offers controls for restricting your Pods to execute with only their minimum required capabilities.
Authenticating to Cloud Platform with Service Accounts. The recommended way to authenticate to Google Cloud Platform services from applications running on GKE is to create your own service accounts. Ideally you must create a new service account for each application that makes requests to Cloud Platform APIs.
References
- https://cloud.google.com/solutions/prep-kubernetes-engine-for-prod
- https://cloud.google.com/blog/products/networking/google-cloud-networking-in-depth-three-defense-in-depth-principles-for-securing-your-environment
- https://cloud.google.com/kubernetes-engine/docs/how-to/hardening-your-cluster
- https://cloud.google.com/solutions/best-practices-for-operating-containers
- https://cloud.google.com/solutions/best-practices-for-building-containers
- https://cloud.google.com/docs/enterprise/best-practices-for-enterprise-organizations
- https://cloud.google.com/solutions/best-practices-vpc-design
- Container Native Load Balancing
- https://cloud.google.com/blog/products/gcp/with-forseti-spotify-and-google-release-gcp-security-tools-to-open-source-community15