OpenShiftだと最初から入ってるけれど、いくつかのマネージドK8sを除いて、EKSやAKSを含めて素のK8sクラスタは初期状態だとメトリクスサーバーはデプロイされていない。
Prometheus Operatorを使うと、メトリクスサーバーとして動作する最近流行りのPrometheusを簡単にデプロイできる。
とりあえずデプロイ手順について。
設定や使い方はまだあまり分かってないのでそのうち…
環境
ローカルにkindを使ってデプロイした環境。
MetalLB有り。
[zaki@cloud-dev ~]$ kubectl version --short Client Version: v1.19.2 Server Version: v1.18.2
podの初期状態
[zaki@cloud-dev kube-prometheus]$ kc get pod -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-66bff467f8-92f5g 1/1 Running 0 116m kube-system coredns-66bff467f8-jx2vk 1/1 Running 0 116m kube-system etcd-prac-cluster-control-plane 1/1 Running 0 116m kube-system kindnet-2bh96 1/1 Running 0 116m kube-system kindnet-5sbfj 1/1 Running 0 116m kube-system kindnet-kfw7s 1/1 Running 1 116m kube-system kube-apiserver-prac-cluster-control-plane 1/1 Running 0 116m kube-system kube-controller-manager-prac-cluster-control-plane 1/1 Running 0 116m kube-system kube-proxy-gz89m 1/1 Running 0 116m kube-system kube-proxy-hf466 1/1 Running 0 116m kube-system kube-proxy-j6cpf 1/1 Running 0 116m kube-system kube-scheduler-prac-cluster-control-plane 1/1 Running 0 116m local-path-storage local-path-provisioner-bd4bb6b75-h4bnb 1/1 Running 0 116m metallb-system controller-57f648cb96-vwntj 1/1 Running 0 91m metallb-system speaker-mtvfj 1/1 Running 0 91m metallb-system speaker-nsh46 1/1 Running 0 91m metallb-system speaker-z672b 1/1 Running 0 91m sample-app sample-http-744f56bdc6-bdvt6 1/1 Running 0 72m sample-app sample-http-744f56bdc6-hqc6z 1/1 Running 0 72m
Prometheus Operatorをデプロイする
マニフェスト類の取得を手元にcloneする。(ディレクトリ以下全yamlをapply
するので)
$ git clone https://github.com/coreos/kube-prometheus $ cd kube-prometheus
デプロイ
# Create the namespace and CRDs, and then wait for them to be availble before creating the remaining resources kubectl create -f manifests/setup until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done kubectl create -f manifests/
これだけでデプロイされる。簡単。
実行例
実行すると
[zaki@cloud-dev kube-prometheus]$ # Create the namespace and CRDs, and then wait for them to be availble before creating the remaining resources [zaki@cloud-dev kube-prometheus]$ kubectl create -f manifests/setup namespace/monitoring created customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com created clusterrole.rbac.authorization.k8s.io/prometheus-operator created clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created deployment.apps/prometheus-operator created service/prometheus-operator created serviceaccount/prometheus-operator created [zaki@cloud-dev kube-prometheus]$ until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done No resources found [zaki@cloud-dev kube-prometheus]$ kubectl create -f manifests/ alertmanager.monitoring.coreos.com/main created secret/alertmanager-main created service/alertmanager-main created serviceaccount/alertmanager-main created servicemonitor.monitoring.coreos.com/alertmanager created secret/grafana-datasources created configmap/grafana-dashboard-apiserver created configmap/grafana-dashboard-cluster-total created configmap/grafana-dashboard-controller-manager created configmap/grafana-dashboard-k8s-resources-cluster created configmap/grafana-dashboard-k8s-resources-namespace created configmap/grafana-dashboard-k8s-resources-node created configmap/grafana-dashboard-k8s-resources-pod created configmap/grafana-dashboard-k8s-resources-workload created configmap/grafana-dashboard-k8s-resources-workloads-namespace created configmap/grafana-dashboard-kubelet created configmap/grafana-dashboard-namespace-by-pod created configmap/grafana-dashboard-namespace-by-workload created configmap/grafana-dashboard-node-cluster-rsrc-use created configmap/grafana-dashboard-node-rsrc-use created configmap/grafana-dashboard-nodes created configmap/grafana-dashboard-persistentvolumesusage created configmap/grafana-dashboard-pod-total created configmap/grafana-dashboard-prometheus-remote-write created configmap/grafana-dashboard-prometheus created configmap/grafana-dashboard-proxy created configmap/grafana-dashboard-scheduler created configmap/grafana-dashboard-statefulset created configmap/grafana-dashboard-workload-total created configmap/grafana-dashboards created deployment.apps/grafana created service/grafana created serviceaccount/grafana created servicemonitor.monitoring.coreos.com/grafana created clusterrole.rbac.authorization.k8s.io/kube-state-metrics created clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created deployment.apps/kube-state-metrics created service/kube-state-metrics created serviceaccount/kube-state-metrics created servicemonitor.monitoring.coreos.com/kube-state-metrics created clusterrole.rbac.authorization.k8s.io/node-exporter created clusterrolebinding.rbac.authorization.k8s.io/node-exporter created daemonset.apps/node-exporter created service/node-exporter created serviceaccount/node-exporter created servicemonitor.monitoring.coreos.com/node-exporter created apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created clusterrole.rbac.authorization.k8s.io/prometheus-adapter created clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created configmap/adapter-config created deployment.apps/prometheus-adapter created rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created service/prometheus-adapter created serviceaccount/prometheus-adapter created servicemonitor.monitoring.coreos.com/prometheus-adapter created clusterrole.rbac.authorization.k8s.io/prometheus-k8s created clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created servicemonitor.monitoring.coreos.com/prometheus-operator created prometheus.monitoring.coreos.com/k8s created rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s-config created role.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s created prometheusrule.monitoring.coreos.com/prometheus-k8s-rules created service/prometheus-k8s created serviceaccount/prometheus-k8s created servicemonitor.monitoring.coreos.com/prometheus created servicemonitor.monitoring.coreos.com/kube-apiserver created servicemonitor.monitoring.coreos.com/coredns created servicemonitor.monitoring.coreos.com/kube-controller-manager created servicemonitor.monitoring.coreos.com/kube-scheduler created servicemonitor.monitoring.coreos.com/kubelet created
こんな感じで大量のリソースが作成される。
manifests/setup
以下にCRD等があるので、先にkubectl create -f manifests/
を実行してしまうとエラーになるので注意。
[zaki@cloud-dev kube-prometheus]$ kc get pod -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-66bff467f8-92f5g 1/1 Running 0 138m kube-system coredns-66bff467f8-jx2vk 1/1 Running 0 138m kube-system etcd-prac-cluster-control-plane 1/1 Running 0 138m kube-system kindnet-2bh96 1/1 Running 0 138m kube-system kindnet-5sbfj 1/1 Running 0 138m kube-system kindnet-kfw7s 1/1 Running 1 138m kube-system kube-apiserver-prac-cluster-control-plane 1/1 Running 0 138m kube-system kube-controller-manager-prac-cluster-control-plane 1/1 Running 1 138m kube-system kube-proxy-gz89m 1/1 Running 0 138m kube-system kube-proxy-hf466 1/1 Running 0 138m kube-system kube-proxy-j6cpf 1/1 Running 0 138m kube-system kube-scheduler-prac-cluster-control-plane 1/1 Running 1 138m local-path-storage local-path-provisioner-bd4bb6b75-h4bnb 1/1 Running 1 138m metallb-system controller-57f648cb96-vwntj 1/1 Running 0 113m metallb-system speaker-mtvfj 1/1 Running 0 113m metallb-system speaker-nsh46 1/1 Running 0 113m metallb-system speaker-z672b 1/1 Running 0 113m monitoring alertmanager-main-0 2/2 Running 0 3m37s monitoring alertmanager-main-1 2/2 Running 0 3m37s monitoring alertmanager-main-2 2/2 Running 0 3m37s monitoring grafana-86445dccbb-gx4p7 1/1 Running 0 3m38s monitoring kube-state-metrics-5b67d79459-z4xh6 3/3 Running 0 3m38s monitoring node-exporter-7rq9n 2/2 Running 0 3m38s monitoring node-exporter-cfxqr 2/2 Running 0 3m38s monitoring node-exporter-rzsh5 2/2 Running 0 3m38s monitoring prometheus-adapter-66b855f564-cst25 1/1 Running 0 3m38s monitoring prometheus-k8s-0 3/3 Running 1 3m37s monitoring prometheus-k8s-1 3/3 Running 1 3m37s monitoring prometheus-operator-78fcb48ccf-zbx77 2/2 Running 0 3m39s sample-app sample-http-744f56bdc6-bdvt6 1/1 Running 0 95m sample-app sample-http-744f56bdc6-hqc6z 1/1 Running 0 95m
dashboardにアクセス
Access the dashboardsを見ればだいたい載ってる。
デプロイすると各Serviceリソースが作成されるので、基本はそれにアクセスすればいい。
今回はドキュメントに沿ってport-forward
を使用。
NodePort ServiceやLoadbalancer Serviceを使ってももちろん良い。
[zaki@cloud-dev kube-prometheus]$ kc get svc -n monitoring NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager-main ClusterIP 10.99.231.146 <none> 9093/TCP 14h alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 14h grafana ClusterIP 10.106.181.97 <none> 3000/TCP 14h kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 14h node-exporter ClusterIP None <none> 9100/TCP 14h prometheus-adapter ClusterIP 10.96.68.195 <none> 443/TCP 14h prometheus-k8s ClusterIP 10.100.171.252 <none> 9090/TCP 14h prometheus-operated ClusterIP None <none> 9090/TCP 14h prometheus-operator ClusterIP None <none> 8443/TCP 14h
Prometheus
[zaki@cloud-dev kube-prometheus]$ kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090 Forwarding from 127.0.0.1:9090 -> 9090 Forwarding from [::1]:9090 -> 9090
これで http://localhost:9090/ にアクセスすれば、Prometheusのダッシュボードを参照できる。
Targetページを見たところ。
Graphのページで、例えばnode_load15
のグラフを表示したところ。
Grafana
Grafanaも同様にport-forward
で転送設定を行う。
[zaki@cloud-dev kube-prometheus]$ kubectl --namespace monitoring port-forward svc/grafana 3000 Forwarding from 127.0.0.1:3000 -> 3000 Forwarding from [::1]:3000 -> 3000
Prometheusと異なり、ログイン画面が表示される。
username: admin
、password: admin
でログインすると、パスワード変更画面になるので、適当に設定する。
ダッシュボードの作成については公式のチュートリアルもあるので、そちらも参照。
「Data Sources」から「Prometheus」を選択。
URLに、PrometheusのURLを入力。これはGrafanaのpodから見たURLなので、PrometheusのService名を入力すればよい。要はhttp://prometheus-k8s:9090
(あるいはネームスペース名も含めてフルでhttp://prometheus-k8s.monitoring.svc:9090
)
これで、画面下部の「Save & Test」押下。
「Create」メニューの「Dashboard」で新しいパネルの新規作成。
なんの値かよくわからないけどグラフが表示されているけど、下部のQueryタブから先ほど設定したデータソースを選択する(Prometheus-1を作ったけど、なんかデフォルトで同じ内容の"Prometheus"があったっぽい笑)
そしてもう少し下にある入力欄のEnter a PromQL query (run with Shift+Enter)
と表示されてる部分に、Prometheusのときにも表示したnode_load15
を入力、表示されてる通りShift+Enter
押下する。
グラフがプレビューされる。
グラフのタイトルやらいろいろ設定したい箇所はあるけど、とりあえずこれで画面右上の「Apply」を押下すればパネルが完成し、ダッシュボード画面に戻るので保存して名前を付ける。
これで、node_load15
のメトリクス情報を、Searchダッシュボードからすぐにアクセスできるようになる。
※ ダッシュボード名を「node load15」にしてしまってるけど、これはパネル名をnode load15にすべきで、ダッシュボード名はもっと別のにすべきだった。
metricsサーバー
Prometheusデプロイ前のメトリクスサーバーは以下の通り存在しない。
$ kubectl describe apiservice v1beta1.metrics.k8s.io Error from server (NotFound): apiservices.apiregistration.k8s.io "v1beta1.metrics.k8s.io" not found
Prometheusをデプロイするとこの通り。
[zaki@cloud-dev kube-prometheus]$ kubectl describe apiservice v1beta1.metrics.k8s.io Name: v1beta1.metrics.k8s.io Namespace: Labels: <none> Annotations: <none> API Version: apiregistration.k8s.io/v1 Kind: APIService Metadata: Creation Timestamp: 2020-10-10T15:23:33Z Resource Version: 26324 Self Link: /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io UID: 1ccaec3c-955e-47f2-8c28-8828743b767e Spec: Group: metrics.k8s.io Group Priority Minimum: 100 Insecure Skip TLS Verify: true Service: Name: prometheus-adapter Namespace: monitoring Port: 443 Version: v1beta1 Version Priority: 100 Status: Conditions: Last Transition Time: 2020-10-10T15:23:38Z Message: all checks passed Reason: Passed Status: True Type: Available Events: <none>
この通り定義が出来ている。
[zaki@cloud-dev ~]$ kubectl top node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% prac-cluster-control-plane 310m 7% 6495Mi 83% prac-cluster-worker 253m 6% 6495Mi 83% prac-cluster-worker2 247m 6% 6495Mi 83% [zaki@cloud-dev ~]$ kubectl top pod -n kube-system NAME CPU(cores) MEMORY(bytes) coredns-66bff467f8-92f5g 3m 52Mi coredns-66bff467f8-jx2vk 2m 59Mi etcd-prac-cluster-control-plane 23m 228Mi kindnet-2bh96 0m 26Mi kindnet-5sbfj 0m 27Mi kindnet-kfw7s 0m 25Mi kube-apiserver-prac-cluster-control-plane 54m 1078Mi kube-controller-manager-prac-cluster-control-plane 15m 129Mi kube-proxy-gz89m 0m 36Mi kube-proxy-hf466 0m 33Mi kube-proxy-j6cpf 0m 33Mi kube-scheduler-prac-cluster-control-plane 4m 47Mi
また別途まとめたいけど、HPAとかkubectl top
が使えるようになってるはず。
参考情報
ちなみに、今回使ったkube-prometheusと別に、prometheus-operatorというリポジトリもある。
違いは………、正しく理解できてないと思うけど、prometheus-operatorがOperator本体で、kube-prometheusはprometheus-operatorをデプロイするための構成例のカスタムリソース込み、って感じかな…
あとHelmチャートを使ってもデプロイできるみたい。