以前EKSでCiliumを試したときに、そのままだと上手く動かなかったので、半年くらい経ってもう一度各コンポーネントの最新バージョンで試してみたメモ。
コンポーネント | バージョン | 備考 |
---|---|---|
eksctl | 0.54.0 | |
Kubernetes バージョン | 1.20 | |
プラットフォームのバージョン | eks.1 | |
VPC CNI Plugin | 1.7.10 | |
Cilium | 1.10.1 |
VPC CNI Plugin は最新は1.8.0だが、1.7の最新のパッチバージョンが推奨のようなのでそちらで試す。
クラスターの作成
クラスターを作成する。
cat <<EOF > cluster.yaml apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: cilium region: ap-northeast-1 version: "1.20" vpc: cidr: "10.0.0.0/16" availabilityZones: - ap-northeast-1a - ap-northeast-1c managedNodeGroups: - name: managed-ng-1 minSize: 2 maxSize: 2 desiredCapacity: 2 ssh: allow: true publicKeyName: default # enableSsm: true cloudWatch: clusterLogging: enableTypes: ["*"] iam: withOIDC: true EOF
eksctl create cluster -f cluster.yaml
VPC CNI Plugin の最新化
バージョンを確認する。Ciliumのマニュアル上も1.7.9以上にしろと書いてある。
$ k get ds -n kube-system -o wide NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR aws-node 2 2 2 2 2 <none> 23m aws-node 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/amazon-k8s-cni:v1.7.5-eksbuild.1 k8s-app=aws-node kube-proxy 2 2 2 2 2 <none> 23m kube-proxy 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.20.4-eksbuild.2 k8s-app=kube-proxy
eksctlがaws-nodeを勝手にIRSAで動かしてくれるので、そのARNを確認する。
$ k get sa -n kube-system aws-node -o yaml | grep role-arn eks.amazonaws.com/role-arn: arn:aws:iam::XXXXXXXXXXXX:role/eksctl-cilium-addon-iamserviceaccount-kube-s-Role1-PUQJWEEGQJXC
EKS addon化してバージョンアップする。
eksctl create addon --cluster cilium \ --name vpc-cni --version 1.7.10 \ --service-account-role-arn=arn:aws:iam::XXXXXXXXXXXX:role:role/eksctl-cilium-addon-iamserviceaccount-kube-s-Role1-PUQJWEEGQJXC \ --force
$ k get ds -n kube-system -o wide NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR aws-node 2 2 2 2 2 <none> 31m aws-node 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/amazon-k8s-cni:v1.7.10-eksbuild.1 k8s-app=aws-node kube-proxy 2 2 2 2 2 <none> 31m kube-proxy 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.20.4-eksbuild.2 k8s-app=kube-proxy
Ciliumのインストール
Helmリポジトリを追加する。
helm repo add cilium https://helm.cilium.io/ helm repo update
HelmでCiliumをインストールする。
helm install cilium cilium/cilium --version 1.10.1 \ --namespace kube-system \ --set cni.chainingMode=aws-cni \ --set enableIPv4Masquerade=false \ --set tunnel=disabled \ --set nodeinit.enabled=true \ --set endpointRoutes.enabled=true
Ciliumがインストールされたことを確認する。
$ kubectl get po -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system aws-node-v8zjq 1/1 Running 0 4m11s kube-system aws-node-zsc4s 1/1 Running 0 3m37s kube-system cilium-57vtb 1/1 Running 0 39s kube-system cilium-dfr7x 1/1 Running 0 39s kube-system cilium-node-init-5cxj2 1/1 Running 0 39s kube-system cilium-node-init-rnt69 1/1 Running 0 39s kube-system cilium-operator-689d85cb47-bmtjd 1/1 Running 0 39s kube-system cilium-operator-689d85cb47-jk4tm 1/1 Running 0 39s kube-system coredns-54bc78bc49-bmqkk 1/1 Running 0 13s kube-system coredns-54bc78bc49-kphgv 1/1 Running 0 28s kube-system kube-proxy-n6sdq 1/1 Running 0 20m kube-system kube-proxy-rcp65 1/1 Running 0 20m
Ciliumをインストールした後、Ciliumでポリシーを適用するために自動で再起動されたCoreDNSが起動しなくなるという問題はなく、問題なさそう。
以下のIssueもクローズされている。
- New pods failing to start with
FailedCreatePodSandBox
warning for CNI versions 1.7.x with Cilium #1265 - Coredns stuck on ContainerCreating with
FailedCreatePodSandBox
warning for CNI versions 1.7.6 with Cilium 1.9.1 #1314
手順にあるとおり、再起動が必要なPodを確認する。
for ns in $(kubectl get ns -o jsonpath='{.items[*].metadata.name}'); do ceps=$(kubectl -n "${ns}" get cep \ -o jsonpath='{.items[*].metadata.name}') pods=$(kubectl -n "${ns}" get pod \ -o custom-columns=NAME:.metadata.name,NETWORK:.spec.hostNetwork \ | grep -E '\s(<none>|false)' | awk '{print $1}' | tr '\n' ' ') ncep=$(echo "${pods} ${ceps}" | tr ' ' '\n' | sort | uniq -u | paste -s -d ' ' -) for pod in $(echo $ncep); do echo "${ns}/${pod}"; done done
特になし。
テスト
テストがCLIできるようになっているので、CLIでテストしてみる。
CLIをインストールする。
curl -L --remote-name-all https://github.com/cilium/cilium-cli/releases/latest/download/cilium-darwin-amd64.tar.gz{,.sha256sum} shasum -a 256 -c cilium-darwin-amd64.tar.gz.sha256sum tar xzvfC cilium-darwin-amd64.tar.gz ${HOME}/bin rm cilium-darwin-amd64.tar.gz{,.sha256sum}
$ cilium version cilium-cli: v0.8.2 compiled with go1.16.5 on darwin/amd64 $ cilium status --wait /¯¯\ /¯¯\__/¯¯\ Cilium: OK \__/¯¯\__/ Operator: OK /¯¯\__/¯¯\ Hubble: disabled \__/¯¯\__/ ClusterMesh: disabled \__/ DaemonSet cilium Desired: 2, Ready: 2/2, Available: 2/2 Deployment cilium-operator Desired: 2, Ready: 2/2, Available: 2/2 Containers: cilium Running: 2 cilium-operator Running: 2 Image versions cilium quay.io/cilium/cilium:v1.10.1@sha256:f5fcdfd4929af5a8903b02da61332eea41dcdb512420b8c807e2e2904270561c: 2 cilium-operator quay.io/cilium/operator-generic:v1.10.1@sha256:a1588ee00a15f2f2b419e4acd36bd57d64a5f10eb52d0fd4de689e558a913cd8: 2
テストを実行する。
$ cilium connectivity test ℹ️ Monitor aggregation detected, will skip some flow validation steps ✨ [cilium.ap-northeast-1.eksctl.io] Creating namespace for connectivity check... ✨ [cilium.ap-northeast-1.eksctl.io] Deploying echo-same-node service... ✨ [cilium.ap-northeast-1.eksctl.io] Deploying same-node deployment... ✨ [cilium.ap-northeast-1.eksctl.io] Deploying client deployment... ✨ [cilium.ap-northeast-1.eksctl.io] Deploying client2 deployment... ✨ [cilium.ap-northeast-1.eksctl.io] Deploying echo-other-node service... ✨ [cilium.ap-northeast-1.eksctl.io] Deploying other-node deployment... ⌛ [cilium.ap-northeast-1.eksctl.io] Waiting for deployments [client client2 echo-same-node] to become ready... ⌛ [cilium.ap-northeast-1.eksctl.io] Waiting for deployments [echo-other-node] to become ready... ⌛ [cilium.ap-northeast-1.eksctl.io] Waiting for CiliumEndpoint for pod cilium-test/client-7b7bf54b85-75nmw to appear... ⌛ [cilium.ap-northeast-1.eksctl.io] Waiting for CiliumEndpoint for pod cilium-test/client2-666976c95b-n29pg to appear... ⌛ [cilium.ap-northeast-1.eksctl.io] Waiting for CiliumEndpoint for pod cilium-test/echo-other-node-697d5d69b7-qxfm5 to appear... ⌛ [cilium.ap-northeast-1.eksctl.io] Waiting for CiliumEndpoint for pod cilium-test/echo-same-node-7967996674-qvm6t to appear... ⌛ [cilium.ap-northeast-1.eksctl.io] Waiting for Service cilium-test/echo-other-node to become ready... ⌛ [cilium.ap-northeast-1.eksctl.io] Waiting for Service cilium-test/echo-same-node to become ready... ⌛ [cilium.ap-northeast-1.eksctl.io] Waiting for NodePort 10.0.1.129:30561 (cilium-test/echo-other-node) to become ready... ⌛ [cilium.ap-northeast-1.eksctl.io] Waiting for NodePort 10.0.1.129:31548 (cilium-test/echo-same-node) to become ready... ⌛ [cilium.ap-northeast-1.eksctl.io] Waiting for NodePort 10.0.58.170:30561 (cilium-test/echo-other-node) to become ready... ⌛ [cilium.ap-northeast-1.eksctl.io] Waiting for NodePort 10.0.58.170:31548 (cilium-test/echo-same-node) to become ready... ⌛ [cilium.ap-northeast-1.eksctl.io] Waiting for Cilium pod kube-system/cilium-57vtb to have all the pod IPs in eBPF ipcache... ⌛ [cilium.ap-northeast-1.eksctl.io] Waiting for Cilium pod kube-system/cilium-dfr7x to have all the pod IPs in eBPF ipcache... ⌛ [cilium.ap-northeast-1.eksctl.io] Waiting for pod cilium-test/client-7b7bf54b85-75nmw to reach kube-dns service... ⌛ [cilium.ap-northeast-1.eksctl.io] Waiting for pod cilium-test/client2-666976c95b-n29pg to reach kube-dns service... 🔭 Enabling Hubble telescope... ⚠️ Unable to contact Hubble Relay, disabling Hubble telescope and flow validation: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp [::1]:4245: connect: connection refused" ℹ️ Expose Relay locally with: kubectl port-forward -n kube-system deployment/hubble-relay 4245:4245 🏃 Running tests... [=] Test [no-policies] ............................. [=] Test [client-ingress] .. [=] Test [echo-ingress] .... [=] Test [to-fqdns] .. ℹ️ 📜 Applying CiliumNetworkPolicy 'client-egress-to-fqdns-google' to namespace 'cilium-test'.. [-] Scenario [to-fqdns/pod-to-world] [.] Action [to-fqdns/pod-to-world/https-to-google: cilium-test/client2-666976c95b-n29pg (10.0.39.210) -> google-https (google.com:443)] [.] Action [to-fqdns/pod-to-world/http-to-google: cilium-test/client-7b7bf54b85-75nmw (10.0.31.143) -> google-http (google.com:80)] ❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --connect-timeout 5 --output /dev/null http://google.com:80" failed: command terminated with exit code 22 [.] Action [to-fqdns/pod-to-world/http-to-www-google: cilium-test/client-7b7bf54b85-75nmw (10.0.31.143) -> www-google-http (www.google.com:80)] ℹ️ 📜 Deleting CiliumNetworkPolicy 'client-egress-to-fqdns-google' from namespace 'cilium-test'.. [=] Test [to-entities-world] ... [=] Test [allow-all] ......................... [=] Test [dns-only] ....... [=] Test [client-egress] .... [=] Test [to-cidr-1111] .... 📋 Test Report ❌ 1/9 tests failed (1/81 actions), 0 warnings, 0 tests skipped, 0 scenarios skipped: Test [to-fqdns]: ❌ to-fqdns/pod-to-world/http-to-google: cilium-test/client-7b7bf54b85-75nmw (10.0.31.143) -> google-http (google.com:80) Error: Connectivity test failed: 1 tests failed
Googleへの疎通確認に失敗している。
ポリシーなしの状態では問題ない。
$ k run pod1 --image=nginx pod/pod1 created $ k get po NAME READY STATUS RESTARTS AGE pod1 1/1 Running 0 10s $ k exec -it pod1 -- bash root@pod1:/# curl http://google.com/ <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8"> <TITLE>301 Moved</TITLE></HEAD><BODY> <H1>301 Moved</H1> The document has moved <A HREF="http://www.google.com/">here</A>. </BODY></HTML> root@pod1:/# exit exit
注意点でいくつかアドバンスドな機能に制限があると書いてあるので、これがその辺かもしれない。