PodSecurity アドミッションを試す

1.23 でベータに昇格した PodSecurity アドミッションを以下のブログにしたがって試す。

準備

Kind でクラスターを起動する。

$ kind create cluster --image kindest/node:v1.23.0
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.23.0) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Have a nice day! 👋

クラスターを確認する。

$ kubectl cluster-info --context kind-kind
Kubernetes control plane is running at https://127.0.0.1:57523
CoreDNS is running at https://127.0.0.1:57523/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
$ kubectl get nodes
NAME                 STATUS   ROLES                  AGE   VERSION
kind-control-plane   Ready    control-plane,master   11m   v1.23.0

PodSecurity アドミッションがデフォルトで有効のリストに入っていることを確認する。

$ kubectl -n kube-system exec kube-apiserver-kind-control-plane -it -- kube-apiserver -h | grep "default enabled ones"
      --enable-admission-plugins strings       admission plugins that should be enabled in addition to default enabled ones (NamespaceLifecycle, LimitRanger, ServiceAccount, TaintNodesByCondition, PodSecurity, Priority, DefaultTolerationSeconds, DefaultStorageClass, StorageObjectInUseProtection, PersistentVolumeClaimResize, RuntimeClass, CertificateApproval, CertificateSigning, CertificateSubjectRestriction, DefaultIngressClass, MutatingAdmissionWebhook, ValidatingAdmissionWebhook, ResourceQuota). Comma-delimited list of admission plugins: AlwaysAdmit, AlwaysDeny, AlwaysPullImages, CertificateApproval, CertificateSigning, CertificateSubjectRestriction, DefaultIngressClass, DefaultStorageClass, DefaultTolerationSeconds, DenyServiceExternalIPs, EventRateLimit, ExtendedResourceToleration, ImagePolicyWebhook, LimitPodHardAntiAffinityTopology, LimitRanger, MutatingAdmissionWebhook, NamespaceAutoProvision, NamespaceExists, NamespaceLifecycle, NodeRestriction, OwnerReferencesPermissionEnforcement, PersistentVolumeClaimResize, PersistentVolumeLabel, PodNodeSelector, PodSecurity, PodSecurityPolicy, PodTolerationRestriction, Priority, ResourceQuota, RuntimeClass, SecurityContextDeny, ServiceAccount, StorageObjectInUseProtection, TaintNodesByCondition, ValidatingAdmissionWebhook. The order of plugins in this flag does not matter.

Namespace を作成する。

$ kubectl create namespace verify-pod-security
namespace/verify-pod-security created

Privileged レベル

Privileged レベルという章タイトルだが、実際には Baseline レベルの Pod なので注意。

Namespace にラベルを付与する。restricted を強制し、監査ログもとる。

$ kubectl label --overwrite ns verify-pod-security \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/audit=restricted
namespace/verify-pod-security labeled
キー
pod-security.kubernetes.io/enforce restricted
pod-security.kubernetes.io/audit restricted
pod-security.kubernetes.io/warn 未指定

allowPrivilegeEscalation: truebaseline では許可されているが、restricted では禁止なので、これから作ろうとしている Pod は baseline な Pod である。ここでは特権 Pod を作るというテストをやろうとしているように思えるので、意図からずれていそうに思えるが、後半に以下の記載があった。

UPDATE: The baseline policy permits allowPrivilegeEscalation. While I cannot see the Pod Security default levels of enforcement, they are there. Let's try to provide a manifest that violates the baseline by requesting hostNetwork access.

baseline な Pod を作成する。

$ cat <<EOF | kubectl -n verify-pod-security apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: busybox-privileged
spec:
  containers:
  - name: busybox
    image: busybox
    args:
    - sleep
    - "1000000"
    securityContext:
      allowPrivilegeEscalation: true
EOF
Error from server (Forbidden): error when creating "STDIN": pods "busybox-privileged" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "busybox" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "busybox" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "busybox" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "busybox" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

restricted に違反するので作成できない。

ラベルを変更する。privileged を強制(すべて許可)し、baseline に違反した場合に警告とログを取る。

$ kubectl label --overwrite ns verify-pod-security \
  pod-security.kubernetes.io/enforce=privileged \
  pod-security.kubernetes.io/warn=baseline \
  pod-security.kubernetes.io/audit=baseline
namespace/verify-pod-security labeled
キー
pod-security.kubernetes.io/enforce privileged
pod-security.kubernetes.io/audit baseline
pod-security.kubernetes.io/warn baseline

もう一度 baseline な Pod を作成する。

$ cat <<EOF | kubectl -n verify-pod-security apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: busybox-privileged
spec:
  containers:
  - name: busybox
    image: busybox
    args:
    - sleep
    - "1000000"
    securityContext:
      allowPrivilegeEscalation: true
EOF
pod/busybox-privileged created

privileged に違反していないので作成でき、baseline に違反していないので警告も出ない。

Pod を削除する。

$ kubectl -n verify-pod-security delete pod busybox-privileged
pod "busybox-privileged" deleted

Baseline レベル

ラベルを変更し、restricted を強制する。最初と同じだが、warn のレベルを指定していないので、そこは前のステップで指定した baseline になっている。

$ kubectl label --overwrite ns verify-pod-security \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/audit=restricted
namespace/verify-pod-security labeled
キー
pod-security.kubernetes.io/enforce restricted
pod-security.kubernetes.io/audit restricted
pod-security.kubernetes.io/warn baseline

baseline な Pod を作成する。

$ cat <<EOF | kubectl -n verify-pod-security apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: busybox-baseline
spec:
  containers:
  - name: busybox
    image: busybox
    args:
    - sleep
    - "1000000"
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        add:
          - NET_BIND_SERVICE
          - CHOWN
EOF
Error from server (Forbidden): error when creating "STDIN": pods "busybox-baseline" is forbidden: violates PodSecurity "restricted:latest": unrestricted capabilities (container "busybox" must set securityContext.capabilities.drop=["ALL"]; container "busybox" must not include "CHOWN" in securityContext.capabilities.add), runAsNonRoot != true (pod or container "busybox" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "busybox" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

restricted に違反するので作成できない。

ラベルを再び変更し、baseline を強制する。warn のレベルは restricted に変更する。

$ kubectl label --overwrite ns verify-pod-security \
  pod-security.kubernetes.io/enforce=baseline \
  pod-security.kubernetes.io/warn=restricted \
  pod-security.kubernetes.io/audit=restricted
namespace/verify-pod-security labeled
キー
pod-security.kubernetes.io/enforce baseline
pod-security.kubernetes.io/audit restricted
pod-security.kubernetes.io/warn restricted

baseline な Pod を作成する。

$ cat <<EOF | kubectl -n verify-pod-security apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: busybox-baseline
spec:
  containers:
  - name: busybox
    image: busybox
    args:
    - sleep
    - "1000000"
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        add:
          - NET_BIND_SERVICE
          - CHOWN
EOF
Warning: would violate PodSecurity "restricted:latest": unrestricted capabilities (container "busybox" must set securityContext.capabilities.drop=["ALL"]; container "busybox" must not include "CHOWN" in securityContext.capabilities.add), runAsNonRoot != true (pod or container "busybox" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "busybox" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
pod/busybox-baseline created

baseline に違反していないので作成できるが、restricted に違反するので警告が出る。

Pod を削除する。

$ kubectl -n verify-pod-security delete pod busybox-baseline
pod "busybox-baseline" deleted

Restricted レベル

再びラベルを変更して restricted を強制する。

$ kubectl label --overwrite ns verify-pod-security \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/audit=restricted
namespace/verify-pod-security labeled
キー
pod-security.kubernetes.io/enforce restricted
pod-security.kubernetes.io/audit restricted
pod-security.kubernetes.io/warn restricted

baseline な Pod を作成する。

$ cat <<EOF | kubectl -n verify-pod-security apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: busybox-restricted
spec:
  containers:
  - name: busybox
    image: busybox
    args:
    - sleep
    - "1000000"
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        add:
          - NET_BIND_SERVICE
EOF
Error from server (Forbidden): error when creating "STDIN": pods "busybox-restricted" is forbidden: violates PodSecurity "restricted:latest": unrestricted capabilities (container "busybox" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "busybox" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "busybox" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

restricted に違反するので作成できない。

マニフェストを変更して restricted に適合するようにして Pod を作成する。

$ cat <<EOF | kubectl -n verify-pod-security apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: busybox-restricted
spec:
  containers:
  - name: busybox
    image: busybox
    args:
    - sleep
    - "1000000"
    securityContext:
      seccompProfile:
        type: RuntimeDefault
      runAsNonRoot: true
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL
        add:
          - NET_BIND_SERVICE
EOF
pod/busybox-restricted created

Pod は作成できたが、実行に失敗している。

$ kubectl -n verify-pod-security get pods
NAME                 READY   STATUS                       RESTARTS   AGE
busybox-restricted   0/1     CreateContainerConfigError   0          48s

Pod の詳細を確認する。

$ kubectl -n verify-pod-security describe pod busybox-restricted
Name:         busybox-restricted
Namespace:    verify-pod-security
...
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  103s                default-scheduler  Successfully assigned verify-pod-security/busybox-restricted to kind-control-plane
  Normal   Pulled     101s                kubelet            Successfully pulled image "busybox" in 1.7259118s
  Normal   Pulled     99s                 kubelet            Successfully pulled image "busybox" in 1.7005545s
  Normal   Pulled     86s                 kubelet            Successfully pulled image "busybox" in 1.5442063s
  Normal   Pulled     69s                 kubelet            Successfully pulled image "busybox" in 1.6893928s
  Normal   Pulled     56s                 kubelet            Successfully pulled image "busybox" in 1.6754146s
  Normal   Pulled     40s                 kubelet            Successfully pulled image "busybox" in 1.5302337s
  Normal   Pulled     25s                 kubelet            Successfully pulled image "busybox" in 1.661749s
  Normal   Pulling    11s (x8 over 103s)  kubelet            Pulling image "busybox"
  Warning  Failed     9s (x8 over 101s)   kubelet            Error: container has runAsNonRoot and image will run as root (pod: "busybox-restricted_verify-pod-security(b6afe0b4-83c8-4838-9444-4f8969444c68)", container: busybox)
  Normal   Pulled     9s                  kubelet            Successfully pulled image "busybox" in 1.6799024s

runAsNonRoot: true を指定しているが、root で実行されているのでエラーになっている。

Pod を一旦削除する。

$ kubectl -n verify-pod-security delete pod busybox-restricted
pod "busybox-restricted" deleted

runAsUser を指定して Pod を再作成する。

$ cat <<EOF | kubectl -n verify-pod-security apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: busybox-restricted
spec:
  securityContext:
    runAsUser: 65534
  containers:
  - name: busybox
    image: busybox
    args:
    - sleep
    - "1000000"
    securityContext:
      seccompProfile:
        type: RuntimeDefault
      runAsNonRoot: true
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL
        add:
          - NET_BIND_SERVICE
EOF
pod/busybox-restricted created

今度は実行できたことも確認する。

$ kubectl -n verify-pod-security get pods
NAME                 READY   STATUS    RESTARTS   AGE
busybox-restricted   1/1     Running   0          52s

Namespace を削除する。

$ kubectl delete namespace verify-pod-security
namespace "verify-pod-security" deleted

クラスターワイドでのポリシー適用

クラスター全体での適用には、AdmissionConfiguration の変更が必要だが、ランタイム変更はできないので、クラスターを削除する。

$ kind delete cluster
Deleting cluster "kind" ...

設定ファイルを作成する。デフォルトを設定している。kube-system を例外に指定している。

$ cat <<EOF > pod-security.yaml
apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: PodSecurity
  configuration:
    apiVersion: pod-security.admission.config.k8s.io/v1beta1
    kind: PodSecurityConfiguration
    defaults:
      enforce: "baseline"
      enforce-version: "latest"
      audit: "baseline"
      audit-version: "latest"
      warn: "restricted"
      warn-version: "latest"
    exemptions:
      # Array of authenticated usernames to exempt.
      usernames: []
      # Array of runtime class names to exempt.
      runtimeClasses: []
      # Array of namespaces to exempt.
      namespaces: [kube-system]
EOF

これを API server に渡す Kind クラスターの設定ファイルを作成する。

cat <<EOF > kind-config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: ClusterConfiguration
    apiServer:
        # enable admission-control-config flag on the API server
        extraArgs:
          admission-control-config-file: /etc/kubernetes/policies/pod-security.yaml
        # mount new file / directories on the control plane
        extraVolumes:
          - name: policies
            hostPath: /etc/kubernetes/policies
            mountPath: /etc/kubernetes/policies
            readOnly: true
            pathType: "DirectoryOrCreate"
  # mount the local file on the control plane
  extraMounts:
  - hostPath: ./pod-security.yaml
    containerPath: /etc/kubernetes/policies/pod-security.yaml
    readOnly: true
EOF

クラスターを作成する。

$ kind create cluster --image kindest/node:v1.23.0 --config kind-config.yaml
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.23.0) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂

Namespace を作成する。

$ kubectl create namespace test-defaults
namespace/test-defaults created
$ kubectl describe namespace test-defaults
Name:         test-defaults
Labels:       kubernetes.io/metadata.name=test-defaults
Annotations:  <none>
Status:       Active

No resource quota.

No LimitRange resource.

ラベルはないが、以下と同等がデフォルトになっているはずである。

キー
pod-security.kubernetes.io/enforce baseline
pod-security.kubernetes.io/audit baseline
pod-security.kubernetes.io/warn restricted

baseline レベルの Pod を作成する。

$ cat <<EOF | kubectl -n test-defaults apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: busybox-privileged
spec:
  containers:
  - name: busybox
    image: busybox
    args:
    - sleep
    - "1000000"
    securityContext:
      allowPrivilegeEscalation: true
EOF
Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "busybox" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "busybox" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "busybox" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "busybox" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
pod/busybox-privileged created

baseline に違反していないので作成できるが、restricted に違反するので警告が出ている。

Pod を削除する。

$ kubectl -n test-defaults delete pod/busybox-privileged
pod "busybox-privileged" deleted

今度は privileged な Pod を作成する。

$ cat <<EOF | kubectl -n test-defaults apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: busybox-privileged
spec:
  containers:
  - name: busybox
    image: busybox
    args:
    - sleep
    - "1000000"
  hostNetwork: true
EOF
Error from server (Forbidden): error when creating "STDIN": pods "busybox-privileged" is forbidden: violates PodSecurity "baseline:latest": host namespaces (hostNetwork=true)

baseline に違反するので確かに拒否された。

メトリクスを見てみる。

$ kubectl get --raw /metrics | grep pod_security_evaluations_total
# HELP pod_security_evaluations_total [ALPHA] Number of policy evaluations that occurred, not counting ignored or exempt requests.
# TYPE pod_security_evaluations_total counter
pod_security_evaluations_total{decision="allow",mode="enforce",policy_level="baseline",policy_version="latest",request_operation="create",resource="pod",subresource=""} 2
pod_security_evaluations_total{decision="allow",mode="enforce",policy_level="privileged",policy_version="latest",request_operation="create",resource="pod",subresource=""} 0
pod_security_evaluations_total{decision="allow",mode="enforce",policy_level="privileged",policy_version="latest",request_operation="update",resource="pod",subresource=""} 0
pod_security_evaluations_total{decision="deny",mode="audit",policy_level="baseline",policy_version="latest",request_operation="create",resource="pod",subresource=""} 1
pod_security_evaluations_total{decision="deny",mode="enforce",policy_level="baseline",policy_version="latest",request_operation="create",resource="pod",subresource=""} 1
pod_security_evaluations_total{decision="deny",mode="warn",policy_level="restricted",policy_version="latest",request_operation="create",resource="controller",subresource=""} 2
pod_security_evaluations_total{decision="deny",mode="warn",policy_level="restricted",policy_version="latest",request_operation="create",resource="pod",subresource=""} 2

監査ログを確認したいところだが、監査ログの有効化が必要なので、今回はスキップ。

ワークロードリソース

Pod ではなく Deployment だとどうなのか確認する。

privileged な Pod を持つ Deployment を作成する。

$ cat <<EOF | kubectl -n test-defaults apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      hostNetwork: true
      containers:
      - image: nginx
        name: nginx
EOF
Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true), allowPrivilegeEscalation != false (container "nginx" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "nginx" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "nginx" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "nginx" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
deployment.apps/nginx created

警告は出すが Deployment リソース自体は作成された。当然 Pod は作成されない。

$ k get deploy -n test-defaults
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   0/1     0            0           73s
$ k get po -n test-defaults
No resources found in test-defaults namespace.

Deployment を削除する。

$ k delete deploy nginx -n test-defaults
deployment.apps "nginx" deleted

baseline な Pod を持つ Deployment を作成する。

$ cat <<EOF | kubectl -n test-defaults apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx
        name: nginx
        securityContext:
          allowPrivilegeEscalation: true
EOF
Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "nginx" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "nginx" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "nginx" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "nginx" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
deployment.apps/nginx created

この場合も警告が出された。Pod は作成される。

$ k get deploy -n test-defaults
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   1/1     1            1           119s
$ k get po -n test-defaults
NAME                     READY   STATUS    RESTARTS   AGE
nginx-688777c4cf-ftn7m   1/1     Running   0          2m

つまり、warn は Deployemnt にも有効だが、enforce は有効でないと考えてよさそう。