zaki work log

作業ログやら生活ログやらなんやら

[Kubernetes + Prometheus] prometheus-adapterを使ってmetrics serverをデプロイしkubectl topを使えるようにする

kubectl tophpaを使うための準備。
Prometheus単体をデプロイするだけだと、metrics serverが存在しないのでこれらは使えない。

[zaki@cloud-dev helm-prometheus]$ kubectl describe apiservice v1beta1.metrics.k8s.io
Error from server (NotFound): apiservices.apiregistration.k8s.io "v1beta1.metrics.k8s.io" not found

Prometheusが動いているクラスタであれば、prometheus-adapterをデプロイすればPrometheusと連携してmetrics serverが有効になる。

github.com

[zaki@cloud-dev helm-prometheus]$ helm search repo prometheus-adapter
NAME                                    CHART VERSION   APP VERSION     DESCRIPTION                                       
prometheus-community/prometheus-adapter 2.7.1           v0.7.0          A Helm chart for k8s prometheus adapter           
stable/prometheus-adapter               2.5.1           v0.7.0          DEPRECATED A Helm chart for k8s prometheus adapter

ただし、デフォルト設定(カスタマイズYAML無し)でデプロイしても使えないので、追加の設定が必要。

環境

この記事の続きです。

zaki-hmkc.hatenablog.com

metrics server

設定

設定のテンプレートはhelm show values prometheus-community/prometheus-adapter > prometheus-adapter.yamlなどで取得する。
GitHubから持ってきても良いが、手元のHelmチャートから出した方がバージョン差異の心配がないと思う。

Prometheusサーバー

まず接続先Prometheusのアドレス設定があるので環境にあわせる。

pathはデフォルト構成であれば/で良い。(というか初期値のblankで良い)
/metricsとかするとダメ。

prometheus:
  url: http://prometheus-server.monitoring.svc.cluster.local
  port: 80
  path: "/"

Prometheus本体のみをHelmチャートでインストールした場合であれば、上記のようになるはず。
URLはFQDNで記述しているが、同じネームスペースにデプロイするのであればServiceリソース名だけで良い。

設定する内容は、クラスタ内でのアクセスのためのServiceリソース名とポート番号を確認する。
Operator構成の場合の以下の場合であればprometheus-stack-kube-prom-prometheus:9090など。

NAME                                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/alertmanager-operated                       ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   3m48s
service/prometheus-operated                         ClusterIP   None             <none>        9090/TCP                     3m41s
service/prometheus-stack-grafana                    ClusterIP   10.109.142.166   <none>        80/TCP                       4m4s
service/prometheus-stack-kube-prom-alertmanager     ClusterIP   10.108.73.186    <none>        9093/TCP                     4m5s
service/prometheus-stack-kube-prom-operator         ClusterIP   10.96.43.230     <none>        8080/TCP,443/TCP             4m4s
service/prometheus-stack-kube-prom-prometheus       ClusterIP   10.97.2.124      <none>        9090/TCP                     4m5s
service/prometheus-stack-kube-state-metrics         ClusterIP   10.99.165.64     <none>        8080/TCP                     4m5s
service/prometheus-stack-prometheus-node-exporter   ClusterIP   10.100.97.230    <none>        9100/TCP                     4m4s

リソース取得ルール

Prometheusのアクセス先アドレスに加えて、下記のリソースメトリクスのルールを追加する。
(最初これを設定しておらずmetricsが有効にならなくてハマった)
たぶんcustomexternalは無くてもよさそう(未確認)だけど、せっかくなので追加している。

github.com

デフォルトではコメントアウトされているので、内容を有効にしてインデントをそろえてあげればOK

カスタマイズYAML全体

# Url to access prometheus
prometheus:
  url: http://prometheus-server.monitoring.svc.cluster.local
  port: 80
  path: "/"

rules:
  default: true
  custom:
  - seriesQuery: '{__name__=~"^some_metric_count$"}'
    resources:
      template: <<.Resource>>
    name:
      matches: ""
      as: "my_custom_metric"
    metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)
  # Mounts a configMap with pre-generated rules for use. Overrides the
  # default, custom, external and resource entries
  existing:
  external:
  - seriesQuery: '{__name__=~"^some_metric_count$"}'
    resources:
      template: <<.Resource>>
    name:
      matches: ""
      as: "my_external_metric"
    metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)
  resource:
    cpu:
      containerQuery: sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>}[3m])) by (<<.GroupBy>>)
      nodeQuery: sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>, id='/'}[3m])) by (<<.GroupBy>>)
      resources:
        overrides:
          instance:
            resource: node
          namespace:
            resource: namespace
          pod:
            resource: pod
      containerLabel: container
    memory:
      containerQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>}) by (<<.GroupBy>>)
      nodeQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>,id='/'}) by (<<.GroupBy>>)
      resources:
        overrides:
          instance:
            resource: node
          namespace:
            resource: namespace
          pod:
            resource: pod
      containerLabel: container
    window: 3m

デプロイ

[zaki@cloud-dev helm-prometheus]$ helm upgrade --install prometheus-adapter -n monitoring prometheus-community/prometheus-adapter -f prometheus-adapter-config.yaml 
Release "prometheus-adapter" does not exist. Installing it now.
W1116 23:43:18.367539  985712 warnings.go:67] apiregistration.k8s.io/v1beta1 APIService is deprecated in v1.19+, unavailable in v1.22+; use apiregistration.k8s.io/v1 APIService
W1116 23:43:18.368426  985712 warnings.go:67] apiregistration.k8s.io/v1beta1 APIService is deprecated in v1.19+, unavailable in v1.22+; use apiregistration.k8s.io/v1 APIService
W1116 23:43:18.369245  985712 warnings.go:67] apiregistration.k8s.io/v1beta1 APIService is deprecated in v1.19+, unavailable in v1.22+; use apiregistration.k8s.io/v1 APIService
W1116 23:43:18.422367  985712 warnings.go:67] apiregistration.k8s.io/v1beta1 APIService is deprecated in v1.19+, unavailable in v1.22+; use apiregistration.k8s.io/v1 APIService
W1116 23:43:18.423967  985712 warnings.go:67] apiregistration.k8s.io/v1beta1 APIService is deprecated in v1.19+, unavailable in v1.22+; use apiregistration.k8s.io/v1 APIService
W1116 23:43:18.424502  985712 warnings.go:67] apiregistration.k8s.io/v1beta1 APIService is deprecated in v1.19+, unavailable in v1.22+; use apiregistration.k8s.io/v1 APIService
NAME: prometheus-adapter
LAST DEPLOYED: Mon Nov 16 23:43:18 2020
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
prometheus-adapter has been deployed.
In a few minutes you should be able to list metrics using the following command(s):

  kubectl get --raw /apis/metrics.k8s.io/v1beta1
  kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1

  kubectl get --raw /apis/external.metrics.k8s.io/v1beta1
[zaki@cloud-dev helm-prometheus]$ 
[zaki@cloud-dev helm-prometheus]$ helm ls -n monitoring 
NAME                    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
prometheus              monitoring      1               2020-11-16 23:41:24.108793125 +0900 JST deployed        prometheus-11.16.9              2.21.0     
prometheus-adapter      monitoring      1               2020-11-16 23:43:18.23814061 +0900 JST  deployed        prometheus-adapter-2.7.1        v0.7.0     

しばらく待てば全てRunningになる。

[zaki@cloud-dev helm-prometheus]$ kc get pod,svc -n monitoring 
NAME                                                READY   STATUS    RESTARTS   AGE
pod/prometheus-adapter-864d8994b8-9ls6v             1/1     Running   0          98s
pod/prometheus-alertmanager-79ff85d65b-8wftc        2/2     Running   0          3m32s
pod/prometheus-kube-state-metrics-95d956569-klsnr   1/1     Running   0          3m32s
pod/prometheus-node-exporter-6gwvr                  1/1     Running   0          3m32s
pod/prometheus-node-exporter-hlmgb                  1/1     Running   0          3m32s
pod/prometheus-pushgateway-6d7c8cc74b-76bzp         1/1     Running   0          3m32s
pod/prometheus-server-7749bb7b9-fssfz               2/2     Running   0          3m32s

NAME                                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/prometheus-adapter              ClusterIP   10.96.107.178   <none>        443/TCP    98s
service/prometheus-alertmanager         ClusterIP   10.96.41.6      <none>        80/TCP     3m32s
service/prometheus-kube-state-metrics   ClusterIP   10.96.75.25     <none>        8080/TCP   3m32s
service/prometheus-node-exporter        ClusterIP   None            <none>        9100/TCP   3m32s
service/prometheus-pushgateway          ClusterIP   10.96.45.242    <none>        9091/TCP   3m32s
service/prometheus-server               ClusterIP   10.96.232.208   <none>        80/TCP     3m32s

Runningになれば、metrics serverの状態が確認できる。

[zaki@cloud-dev helm-prometheus]$ kubectl describe apiservice v1beta1.metrics.k8s.io
Name:         v1beta1.metrics.k8s.io
Namespace:    
Labels:       app=prometheus-adapter
              app.kubernetes.io/managed-by=Helm
              chart=prometheus-adapter-2.7.1
              heritage=Helm
              release=prometheus-adapter
Annotations:  meta.helm.sh/release-name: prometheus-adapter
              meta.helm.sh/release-namespace: monitoring
API Version:  apiregistration.k8s.io/v1
Kind:         APIService
Metadata:
  Creation Timestamp:  2020-11-16T14:43:18Z
  Resource Version:    1745
  Self Link:           /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
  UID:                 49c3d747-ae47-450d-ba85-9eaa1b27aed5
Spec:
  Group:                     metrics.k8s.io
  Group Priority Minimum:    100
  Insecure Skip TLS Verify:  true
  Service:
    Name:            prometheus-adapter
    Namespace:       monitoring
    Port:            443
  Version:           v1beta1
  Version Priority:  100
Status:
  Conditions:
    Last Transition Time:  2020-11-16T14:43:59Z
    Message:               all checks passed
    Reason:                Passed
    Status:                True
    Type:                  Available
Events:                    <none>
[zaki@cloud-dev helm-prometheus]$ 

podがRunningになっていない状態だと、statusがfalseになるので注意。

Status:
  Conditions:
    Last Transition Time:  2020-11-16T14:43:18Z
    Message:               endpoints for service/prometheus-adapter in "monitoring" have no addresses with port name ""
    Reason:                MissingEndpoints
    Status:                False
    Type:                  Available

kubectl top

これでkubectl topコマンドでpodとnodeのリソース値を参照できる。

[zaki@cloud-dev helm-prometheus]$ kubectl top node
NAME                              CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
monitoring-sample-control-plane   75m          1%     825Mi           10%
monitoring-sample-worker          25m          0%     315Mi           4%
monitoring-sample-worker2         40m          1%     568Mi           7%
[zaki@cloud-dev helm-prometheus]$ kubectl top pod -A
NAMESPACE            NAME                                                      CPU(cores)   MEMORY(bytes)
kube-system          coredns-f9fd979d6-5h542                                   4m           38Mi
kube-system          coredns-f9fd979d6-px55p                                   3m           36Mi
kube-system          etcd-monitoring-sample-control-plane                      33m          113Mi
kube-system          kindnet-57ksh                                             0m           29Mi
kube-system          kindnet-p9p2w                                             0m           28Mi
kube-system          kindnet-s9hdq                                             0m           25Mi
kube-system          kube-apiserver-monitoring-sample-control-plane            102m         1222Mi
kube-system          kube-controller-manager-monitoring-sample-control-plane   21m          175Mi
kube-system          kube-proxy-h96r7                                          0m           62Mi
kube-system          kube-proxy-ksv2k                                          0m           43Mi
kube-system          kube-proxy-w2gwh                                          0m           55Mi
kube-system          kube-scheduler-monitoring-sample-control-plane            6m           93Mi
local-path-storage   local-path-provisioner-78776bfc44-4xqn7                   1m           27Mi
monitoring           prometheus-adapter-864d8994b8-9ls6v                       25m          129Mi
monitoring           prometheus-alertmanager-79ff85d65b-8wftc                  1m           40Mi
monitoring           prometheus-kube-state-metrics-95d956569-klsnr             0m           27Mi
monitoring           prometheus-node-exporter-6gwvr                            0m           26Mi
monitoring           prometheus-node-exporter-hlmgb                            0m           27Mi
monitoring           prometheus-pushgateway-6d7c8cc74b-76bzp                   0m           21Mi
monitoring           prometheus-server-7749bb7b9-fssfz                         10m          464Mi

マネージドサービス

ちなみにAKSのように、デフォルトでmetrics serverが有効なマネージドサービスもある。 EKSはほぼ素の状態なので、デフォルトではmetrics serverは稼働していない。

AKSの場合

[zaki@cloud-dev ~]$ kubectl get pod -A
NAMESPACE     NAME                                        READY   STATUS    RESTARTS   AGE
kube-system   coredns-869cb84759-lngs4                    1/1     Running   0          106s
kube-system   coredns-869cb84759-npb9c                    1/1     Running   0          3m39s
kube-system   coredns-autoscaler-5b867494f-fpgxz          1/1     Running   0          3m38s
kube-system   dashboard-metrics-scraper-6f5fb5c4f-5xw6h   1/1     Running   0          3m37s
kube-system   kube-proxy-265pc                            1/1     Running   0          99s
kube-system   kube-proxy-kbgmz                            1/1     Running   0          2m1s
kube-system   kubernetes-dashboard-849d5c99ff-6k4w2       1/1     Running   0          3m37s
kube-system   metrics-server-5f4c878d8-cxdv4              1/1     Running   0          3m39s
kube-system   tunnelfront-557f878cb5-m4l5z                1/1     Running   0          3m36s
[zaki@cloud-dev ~]$ kubectl describe apiservice v1beta1.metrics.k8s.io
Name:         v1beta1.metrics.k8s.io
Namespace:    
Labels:       addonmanager.kubernetes.io/mode=Reconcile
              kubernetes.io/cluster-service=true
Annotations:  <none>
API Version:  apiregistration.k8s.io/v1
Kind:         APIService
Metadata:
  Creation Timestamp:  2020-11-16T15:04:44Z
  Resource Version:    963
  Self Link:           /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
  UID:                 6c07d1be-ba24-4d71-a599-894d0072c1c5
Spec:
  Group:                     metrics.k8s.io
  Group Priority Minimum:    100
  Insecure Skip TLS Verify:  true
  Service:
    Name:            metrics-server
    Namespace:       kube-system
    Port:            443
  Version:           v1beta1
  Version Priority:  100
Status:
  Conditions:
    Last Transition Time:  2020-11-16T15:07:05Z
    Message:               all checks passed
    Reason:                Passed
    Status:                True
    Type:                  Available
Events:                    <none>

EKS

[zaki@cloud-dev ~]$ kubectl get pod -A
NAMESPACE     NAME                      READY   STATUS    RESTARTS   AGE
kube-system   aws-node-hq9rc            1/1     Running   0          109s
kube-system   coredns-79769ff86-48f6s   1/1     Running   0          8m11s
kube-system   coredns-79769ff86-57llm   1/1     Running   0          8m11s
kube-system   kube-proxy-s28dx          1/1     Running   0          109s