※ 2019.11.22追記
解決しました
zaki-hmkc.hatenablog.com
時代はOpenShift 4でPrometheusな世の中になってるのは承知してるけど仕事で使ってるんです、OpenShift 3.11でHawkular。
ということで(どういうわけだ)、自分用のOKD(OpenShfit Origin)で環境作ってそこでHawkularを入れようとしたときの作業ログ。(成功していません)
先に結論を書くと、下記チケットと同じ感じのエラーで2019.09中旬~2019.10上旬時点に試した限りではデプロイには成功していない。
1619497 – [3.11]hawkular-metrics pod failed to start up due to unsuccessful version check
10月5日時点で"VERIFIED"になっているので、修正版リリースお待ちしてます…
※ ちなみにEnterprise版 (openshift_deployment_type=openshift-enterprise
)であれば問題ありません。
対象バージョン: openshift-ansible: 3.11.146 ~ 3.11.148
※ 内容(logなど)は何度もtrial and errorのツギハギです
OKDを手軽に入れるにはMinishiftを使うのが簡単だけど、これを使うとHawkular Metricsのデプロイをどうするかわからなかった。
ので、Enterprise版と同様にAnsibleを使ってオンプレにマルチノードクラスタを構築するOKDをデプロイします。
(まだ書いてないのでそのうち記事書こう)
inventoryファイル
OKDでHawkular Metricsを有効にするにはplaybookのだいたい以下の部分 (既存NFSのPVを使う場合)
# Metrics deployment # See: https://docs.openshift.com/container-platform/latest/install_config/cluster_metrics.html # # By default metrics are not automatically deployed, set this to enable them openshift_metrics_install_metrics=true # # metrics-server deployment # By default, metrics-server is not automatically deployed, unless metrics is also # deployed. Deploying metrics-server is necessary to use the HorizontalPodAutoscaler. # Set this to enable it. openshift_metrics_server_install=true # Storage Options # If openshift_metrics_storage_kind is unset then metrics will be stored # in an EmptyDir volume and will be deleted when the cassandra pod terminates. # Storage options A & B currently support only one cassandra pod which is # generally enough for up to 1000 pods. Additional volumes can be created # manually after the fact and metrics scaled per the docs. # Option B - External NFS Host # NFS volume must already exist with path "nfs_directory/_volume_name" on # the storage_host. For example, the remote volume path using these # options would be "nfs.example.com:/exports/metrics". "exports" is # is the name of the export served by the nfs server. "metrics" is # the name of a directory inside of "/exports". openshift_metrics_storage_kind=nfs openshift_metrics_storage_access_modes=['ReadWriteOnce'] openshift_metrics_storage_host=nfs-server.example.org openshift_metrics_storage_nfs_directory=/export/nfs openshift_metrics_storage_volume_name=metrics openshift_metrics_storage_volume_size=10Gi openshift_metrics_storage_labels={'storage': 'metrics'}
デプロイ後のpodやevent状態
これでansible-playbook playbooks/deploy_cluster.yml
を実行すると、OpenShiftクラスタのデプロイは成功するがメトリクス関連のpodが正常に起動しない。
[zaki@okd-master ~]$ oc get pod -n openshift-infra NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-vgp2c 1/1 Running 0 15h hawkular-metrics-wq89q 0/1 Running 146 15h heapster-fnkrm 0/1 Running 102 15h
だいたいこんな感じで、podはRunningになってるけどReadyが0になっていて、内部的にはprobeチェックが失敗しているためServiceが機能しない。
cassandraは起動してるけど、hawkular-metricsからcassandraの接続に失敗しており、heapsterはhawkular-metricsが起動してないため共倒れ。
状態を確認するとだいたいこんな感じ(↑と同じ環境ではないけどだいたい同じ)
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 10m default-scheduler Successfully assigned openshift-infra/hawkular-metrics-wwfbl to okd-node2.openshift Normal Pulling 10m kubelet, okd-node2.openshift pulling image "docker.io/openshift/origin-metrics-hawkular-metrics:v3.11.0" Normal Pulled 7m kubelet, okd-node2.openshift Successfully pulled image "docker.io/openshift/origin-metrics-hawkular-metrics:v3.11.0" Warning Unhealthy 7m kubelet, okd-node2.openshift Liveness probe failed: Warning Unhealthy 7m kubelet, okd-node2.openshift Readiness probe errored: rpc error: code = Unknown desc = container not running (293c05ce4382bb4a36a304748a64ce0e7675d412a1c22629fd324c33f7b9d254) Warning Unhealthy 6m (x2 over 7m) kubelet, okd-node2.openshift Readiness probe failed: Normal Pulled 6m (x2 over 7m) kubelet, okd-node2.openshift Container image "docker.io/openshift/origin-metrics-hawkular-metrics:v3.11.0" already present on machine Normal Created 6m (x3 over 7m) kubelet, okd-node2.openshift Created container Normal Started 6m (x3 over 7m) kubelet, okd-node2.openshift Started container Warning Unhealthy 5m (x4 over 7m) kubelet, okd-node2.openshift Liveness probe failed: Failed to access the status endpoint : <urlopen error [Errno 111] Connection refused>. Traceback (most recent call last): File "/opt/hawkular/scripts/hawkular-metrics-liveness.py", line 48, in <module> if int(uptime) < int(timeout): ValueError: invalid literal for int() with base 10: '' Warning Unhealthy 5m (x5 over 7m) kubelet, okd-node2.openshift Readiness probe failed: Failed to access the status endpoint : <urlopen error [Errno 111] Connection refused>. This may be due to Hawkular Metrics not being ready yet. Will try again. Normal Killing 5m (x2 over 6m) kubelet, okd-node2.openshift Killing container with id docker://hawkular-metrics:Container failed liveness probe.. Container will be killed and recreated. Warning Unhealthy 5s (x28 over 6m) kubelet, okd-node2.openshift Readiness probe failed: Failed to access the status endpoint : timed out. This may be due to Hawkular Metrics not being ready yet. Will try again. [zaki@okd-master ~]$
podのlog
oc logs
するとこんな感じ。
細かく見るとBugzillaの内容と微妙に違うか。。
2019-09-23 01:36:11,539 INFO [org.wildfly.extension.undertow] (MSC service thread 1-2) WFLYUT0019: Host default-host stopping 2019-09-23 01:36:13,556 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: unconfigured table sys_config 2019-09-23 01:36:13,556 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms 2019-09-23 01:36:16,473 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0190: Step handler org.jboss.as.controller.AbstractAddStepHandler$1@2e18853d for operation add at address [ ("socket-binding-group" => "standard-sockets"), ("socket-binding" => "txn-recovery-environment") ] failed handling operation rollback -- java.util.concurrent.TimeoutException: java.util.concurrent.TimeoutException at org.jboss.as.controller.OperationContextImpl.waitForRemovals(OperationContextImpl.java:522) at org.jboss.as.controller.AbstractOperationContext$Step.handleResult(AbstractOperationContext.java:1483) at org.jboss.as.controller.AbstractOperationContext$Step.finalizeInternal(AbstractOperationContext.java:1437) at org.jboss.as.controller.AbstractOperationContext$Step.finalizeStep(AbstractOperationContext.java:1420) at org.jboss.as.controller.AbstractOperationContext$Step.access$400(AbstractOperationContext.java:1284) at org.jboss.as.controller.AbstractOperationContext.executeResultHandlerPhase(AbstractOperationContext.java:857) at org.jboss.as.controller.AbstractOperationContext.processStages(AbstractOperationContext.java:709) at org.jboss.as.controller.AbstractOperationContext.executeOperation(AbstractOperationContext.java:450) at org.jboss.as.controller.OperationContextImpl.executeOperation(OperationContextImpl.java:1402) at org.jboss.as.controller.ModelControllerImpl.boot(ModelControllerImpl.java:516) at org.jboss.as.controller.AbstractControllerService.boot(AbstractControllerService.java:468) at org.jboss.as.controller.AbstractControllerService.boot(AbstractControllerService.java:430) at org.jboss.as.server.ServerService.boot(ServerService.java:437) at org.jboss.as.server.ServerService.boot(ServerService.java:396) at org.jboss.as.controller.AbstractControllerService$1.run(AbstractControllerService.java:370) at java.lang.Thread.run(Thread.java:748)
2019-10-03 21:32:14,237 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: unconfigured table sys_config 2019-10-03 21:32:14,237 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms 2019-10-03 21:32:24,245 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: unconfigured table sys_config 2019-10-03 21:32:24,246 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms
なんとなく、cassandra上に作成されるべきテーブル(sys_config
ってテーブル)に不備がありそうな雰囲気。
(デプロイの過程で正しく作成されていないとか、そんな感じ)
pullされているイメージ
dockerhubからpullできるhawkular-metricsのイメージ、1年前から変更されてないってことは、ずっとこの状態だったってことなのかな。
…だれも使ってなかった?😂
[zaki@okd-master ~]$ sudo docker images docker.io/openshift/origin-metrics-hawkular-metrics REPOSITORY TAG IMAGE ID CREATED SIZE docker.io/openshift/origin-metrics-hawkular-metrics v3.11 59e2258250c4 11 months ago 860 MB
TODO