kubectl top
やhpa
を使うための準備。
Prometheus単体をデプロイするだけだと、metrics serverが存在しないのでこれらは使えない。
[zaki@cloud-dev helm-prometheus]$ kubectl describe apiservice v1beta1.metrics.k8s.io Error from server (NotFound): apiservices.apiregistration.k8s.io "v1beta1.metrics.k8s.io" not found
Prometheusが動いているクラスタであれば、prometheus-adapter
をデプロイすればPrometheusと連携してmetrics serverが有効になる。
[zaki@cloud-dev helm-prometheus]$ helm search repo prometheus-adapter NAME CHART VERSION APP VERSION DESCRIPTION prometheus-community/prometheus-adapter 2.7.1 v0.7.0 A Helm chart for k8s prometheus adapter stable/prometheus-adapter 2.5.1 v0.7.0 DEPRECATED A Helm chart for k8s prometheus adapter
ただし、デフォルト設定(カスタマイズYAML無し)でデプロイしても使えないので、追加の設定が必要。
環境
この記事の続きです。
metrics server
設定
設定のテンプレートはhelm show values prometheus-community/prometheus-adapter > prometheus-adapter.yaml
などで取得する。
GitHubから持ってきても良いが、手元のHelmチャートから出した方がバージョン差異の心配がないと思う。
Prometheusサーバー
まず接続先Prometheusのアドレス設定があるので環境にあわせる。
path
はデフォルト構成であれば/
で良い。(というか初期値のblankで良い)
/metrics
とかするとダメ。
prometheus: url: http://prometheus-server.monitoring.svc.cluster.local port: 80 path: "/"
Prometheus本体のみをHelmチャートでインストールした場合であれば、上記のようになるはず。
URLはFQDNで記述しているが、同じネームスペースにデプロイするのであればServiceリソース名だけで良い。
設定する内容は、クラスタ内でのアクセスのためのServiceリソース名とポート番号を確認する。
Operator構成の場合の以下の場合であればprometheus-stack-kube-prom-prometheus:9090
など。
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 3m48s service/prometheus-operated ClusterIP None <none> 9090/TCP 3m41s service/prometheus-stack-grafana ClusterIP 10.109.142.166 <none> 80/TCP 4m4s service/prometheus-stack-kube-prom-alertmanager ClusterIP 10.108.73.186 <none> 9093/TCP 4m5s service/prometheus-stack-kube-prom-operator ClusterIP 10.96.43.230 <none> 8080/TCP,443/TCP 4m4s service/prometheus-stack-kube-prom-prometheus ClusterIP 10.97.2.124 <none> 9090/TCP 4m5s service/prometheus-stack-kube-state-metrics ClusterIP 10.99.165.64 <none> 8080/TCP 4m5s service/prometheus-stack-prometheus-node-exporter ClusterIP 10.100.97.230 <none> 9100/TCP 4m4s
リソース取得ルール
Prometheusのアクセス先アドレスに加えて、下記のリソースメトリクスのルールを追加する。
(最初これを設定しておらずmetricsが有効にならなくてハマった)
たぶんcustom
、external
は無くてもよさそう(未確認)だけど、せっかくなので追加している。
デフォルトではコメントアウトされているので、内容を有効にしてインデントをそろえてあげればOK
カスタマイズYAML全体
# Url to access prometheus prometheus: url: http://prometheus-server.monitoring.svc.cluster.local port: 80 path: "/" rules: default: true custom: - seriesQuery: '{__name__=~"^some_metric_count$"}' resources: template: <<.Resource>> name: matches: "" as: "my_custom_metric" metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>) # Mounts a configMap with pre-generated rules for use. Overrides the # default, custom, external and resource entries existing: external: - seriesQuery: '{__name__=~"^some_metric_count$"}' resources: template: <<.Resource>> name: matches: "" as: "my_external_metric" metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>) resource: cpu: containerQuery: sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>}[3m])) by (<<.GroupBy>>) nodeQuery: sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>, id='/'}[3m])) by (<<.GroupBy>>) resources: overrides: instance: resource: node namespace: resource: namespace pod: resource: pod containerLabel: container memory: containerQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>}) by (<<.GroupBy>>) nodeQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>,id='/'}) by (<<.GroupBy>>) resources: overrides: instance: resource: node namespace: resource: namespace pod: resource: pod containerLabel: container window: 3m
デプロイ
[zaki@cloud-dev helm-prometheus]$ helm upgrade --install prometheus-adapter -n monitoring prometheus-community/prometheus-adapter -f prometheus-adapter-config.yaml Release "prometheus-adapter" does not exist. Installing it now. W1116 23:43:18.367539 985712 warnings.go:67] apiregistration.k8s.io/v1beta1 APIService is deprecated in v1.19+, unavailable in v1.22+; use apiregistration.k8s.io/v1 APIService W1116 23:43:18.368426 985712 warnings.go:67] apiregistration.k8s.io/v1beta1 APIService is deprecated in v1.19+, unavailable in v1.22+; use apiregistration.k8s.io/v1 APIService W1116 23:43:18.369245 985712 warnings.go:67] apiregistration.k8s.io/v1beta1 APIService is deprecated in v1.19+, unavailable in v1.22+; use apiregistration.k8s.io/v1 APIService W1116 23:43:18.422367 985712 warnings.go:67] apiregistration.k8s.io/v1beta1 APIService is deprecated in v1.19+, unavailable in v1.22+; use apiregistration.k8s.io/v1 APIService W1116 23:43:18.423967 985712 warnings.go:67] apiregistration.k8s.io/v1beta1 APIService is deprecated in v1.19+, unavailable in v1.22+; use apiregistration.k8s.io/v1 APIService W1116 23:43:18.424502 985712 warnings.go:67] apiregistration.k8s.io/v1beta1 APIService is deprecated in v1.19+, unavailable in v1.22+; use apiregistration.k8s.io/v1 APIService NAME: prometheus-adapter LAST DEPLOYED: Mon Nov 16 23:43:18 2020 NAMESPACE: monitoring STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: prometheus-adapter has been deployed. In a few minutes you should be able to list metrics using the following command(s): kubectl get --raw /apis/metrics.k8s.io/v1beta1 kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 kubectl get --raw /apis/external.metrics.k8s.io/v1beta1 [zaki@cloud-dev helm-prometheus]$ [zaki@cloud-dev helm-prometheus]$ helm ls -n monitoring NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION prometheus monitoring 1 2020-11-16 23:41:24.108793125 +0900 JST deployed prometheus-11.16.9 2.21.0 prometheus-adapter monitoring 1 2020-11-16 23:43:18.23814061 +0900 JST deployed prometheus-adapter-2.7.1 v0.7.0
しばらく待てば全てRunningになる。
[zaki@cloud-dev helm-prometheus]$ kc get pod,svc -n monitoring NAME READY STATUS RESTARTS AGE pod/prometheus-adapter-864d8994b8-9ls6v 1/1 Running 0 98s pod/prometheus-alertmanager-79ff85d65b-8wftc 2/2 Running 0 3m32s pod/prometheus-kube-state-metrics-95d956569-klsnr 1/1 Running 0 3m32s pod/prometheus-node-exporter-6gwvr 1/1 Running 0 3m32s pod/prometheus-node-exporter-hlmgb 1/1 Running 0 3m32s pod/prometheus-pushgateway-6d7c8cc74b-76bzp 1/1 Running 0 3m32s pod/prometheus-server-7749bb7b9-fssfz 2/2 Running 0 3m32s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/prometheus-adapter ClusterIP 10.96.107.178 <none> 443/TCP 98s service/prometheus-alertmanager ClusterIP 10.96.41.6 <none> 80/TCP 3m32s service/prometheus-kube-state-metrics ClusterIP 10.96.75.25 <none> 8080/TCP 3m32s service/prometheus-node-exporter ClusterIP None <none> 9100/TCP 3m32s service/prometheus-pushgateway ClusterIP 10.96.45.242 <none> 9091/TCP 3m32s service/prometheus-server ClusterIP 10.96.232.208 <none> 80/TCP 3m32s
Runningになれば、metrics serverの状態が確認できる。
[zaki@cloud-dev helm-prometheus]$ kubectl describe apiservice v1beta1.metrics.k8s.io Name: v1beta1.metrics.k8s.io Namespace: Labels: app=prometheus-adapter app.kubernetes.io/managed-by=Helm chart=prometheus-adapter-2.7.1 heritage=Helm release=prometheus-adapter Annotations: meta.helm.sh/release-name: prometheus-adapter meta.helm.sh/release-namespace: monitoring API Version: apiregistration.k8s.io/v1 Kind: APIService Metadata: Creation Timestamp: 2020-11-16T14:43:18Z Resource Version: 1745 Self Link: /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io UID: 49c3d747-ae47-450d-ba85-9eaa1b27aed5 Spec: Group: metrics.k8s.io Group Priority Minimum: 100 Insecure Skip TLS Verify: true Service: Name: prometheus-adapter Namespace: monitoring Port: 443 Version: v1beta1 Version Priority: 100 Status: Conditions: Last Transition Time: 2020-11-16T14:43:59Z Message: all checks passed Reason: Passed Status: True Type: Available Events: <none> [zaki@cloud-dev helm-prometheus]$
podがRunningになっていない状態だと、statusがfalseになるので注意。
Status: Conditions: Last Transition Time: 2020-11-16T14:43:18Z Message: endpoints for service/prometheus-adapter in "monitoring" have no addresses with port name "" Reason: MissingEndpoints Status: False Type: Available
kubectl top
これでkubectl top
コマンドでpodとnodeのリソース値を参照できる。
[zaki@cloud-dev helm-prometheus]$ kubectl top node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% monitoring-sample-control-plane 75m 1% 825Mi 10% monitoring-sample-worker 25m 0% 315Mi 4% monitoring-sample-worker2 40m 1% 568Mi 7% [zaki@cloud-dev helm-prometheus]$ kubectl top pod -A NAMESPACE NAME CPU(cores) MEMORY(bytes) kube-system coredns-f9fd979d6-5h542 4m 38Mi kube-system coredns-f9fd979d6-px55p 3m 36Mi kube-system etcd-monitoring-sample-control-plane 33m 113Mi kube-system kindnet-57ksh 0m 29Mi kube-system kindnet-p9p2w 0m 28Mi kube-system kindnet-s9hdq 0m 25Mi kube-system kube-apiserver-monitoring-sample-control-plane 102m 1222Mi kube-system kube-controller-manager-monitoring-sample-control-plane 21m 175Mi kube-system kube-proxy-h96r7 0m 62Mi kube-system kube-proxy-ksv2k 0m 43Mi kube-system kube-proxy-w2gwh 0m 55Mi kube-system kube-scheduler-monitoring-sample-control-plane 6m 93Mi local-path-storage local-path-provisioner-78776bfc44-4xqn7 1m 27Mi monitoring prometheus-adapter-864d8994b8-9ls6v 25m 129Mi monitoring prometheus-alertmanager-79ff85d65b-8wftc 1m 40Mi monitoring prometheus-kube-state-metrics-95d956569-klsnr 0m 27Mi monitoring prometheus-node-exporter-6gwvr 0m 26Mi monitoring prometheus-node-exporter-hlmgb 0m 27Mi monitoring prometheus-pushgateway-6d7c8cc74b-76bzp 0m 21Mi monitoring prometheus-server-7749bb7b9-fssfz 10m 464Mi
マネージドサービス
ちなみにAKSのように、デフォルトでmetrics serverが有効なマネージドサービスもある。 EKSはほぼ素の状態なので、デフォルトではmetrics serverは稼働していない。
AKSの場合
[zaki@cloud-dev ~]$ kubectl get pod -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-869cb84759-lngs4 1/1 Running 0 106s kube-system coredns-869cb84759-npb9c 1/1 Running 0 3m39s kube-system coredns-autoscaler-5b867494f-fpgxz 1/1 Running 0 3m38s kube-system dashboard-metrics-scraper-6f5fb5c4f-5xw6h 1/1 Running 0 3m37s kube-system kube-proxy-265pc 1/1 Running 0 99s kube-system kube-proxy-kbgmz 1/1 Running 0 2m1s kube-system kubernetes-dashboard-849d5c99ff-6k4w2 1/1 Running 0 3m37s kube-system metrics-server-5f4c878d8-cxdv4 1/1 Running 0 3m39s kube-system tunnelfront-557f878cb5-m4l5z 1/1 Running 0 3m36s [zaki@cloud-dev ~]$ kubectl describe apiservice v1beta1.metrics.k8s.io Name: v1beta1.metrics.k8s.io Namespace: Labels: addonmanager.kubernetes.io/mode=Reconcile kubernetes.io/cluster-service=true Annotations: <none> API Version: apiregistration.k8s.io/v1 Kind: APIService Metadata: Creation Timestamp: 2020-11-16T15:04:44Z Resource Version: 963 Self Link: /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io UID: 6c07d1be-ba24-4d71-a599-894d0072c1c5 Spec: Group: metrics.k8s.io Group Priority Minimum: 100 Insecure Skip TLS Verify: true Service: Name: metrics-server Namespace: kube-system Port: 443 Version: v1beta1 Version Priority: 100 Status: Conditions: Last Transition Time: 2020-11-16T15:07:05Z Message: all checks passed Reason: Passed Status: True Type: Available Events: <none>
EKS
[zaki@cloud-dev ~]$ kubectl get pod -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system aws-node-hq9rc 1/1 Running 0 109s kube-system coredns-79769ff86-48f6s 1/1 Running 0 8m11s kube-system coredns-79769ff86-57llm 1/1 Running 0 8m11s kube-system kube-proxy-s28dx 1/1 Running 0 109s