diff options
author | Suren A. Chilingaryan <csa@suren.me> | 2018-07-05 06:29:09 +0200 |
---|---|---|
committer | Suren A. Chilingaryan <csa@suren.me> | 2018-07-05 06:29:09 +0200 |
commit | 2c3f1522274c09f7cfdb6309adc0719f05c188e9 (patch) | |
tree | e54e0c26f581543f48e945f186734e4bd9a8f15a /docs/troubleshooting.txt | |
parent | 8af0865a3a3ef783b36016c17598adc9d932981d (diff) | |
download | ands-2c3f1522274c09f7cfdb6309adc0719f05c188e9.tar.gz ands-2c3f1522274c09f7cfdb6309adc0719f05c188e9.tar.bz2 ands-2c3f1522274c09f7cfdb6309adc0719f05c188e9.tar.xz ands-2c3f1522274c09f7cfdb6309adc0719f05c188e9.zip |
Update monitoring scripts to track leftover OpenVSwitch 'veth' interfaces and clean them up pereodically to avoid performance degradation, split kickstart
Diffstat (limited to 'docs/troubleshooting.txt')
-rw-r--r-- | docs/troubleshooting.txt | 18 |
1 files changed, 18 insertions, 0 deletions
diff --git a/docs/troubleshooting.txt b/docs/troubleshooting.txt index ae43c52..9fa6f91 100644 --- a/docs/troubleshooting.txt +++ b/docs/troubleshooting.txt @@ -134,6 +134,22 @@ etcd (and general operability) pods (failed pods, rogue namespaces, etc...) ==== + - The 'pods' scheduling may fail on one (or more) of the nodes after long waiting with 'oc logs' reporting + timeout. The 'oc describe' reports 'failed to create pod sandbox'. This can be caused by failure to clean-up + after terminated pod properly. It causes rogue network interfaces to remain in OpenVSwitch fabric. + * This can be determined by errors reported using 'ovs-vsctl show' or present in the log '/var/log/openvswitch/ovs-vswitchd.log' + which may quickly grow over 100MB quickly. + could not open network device vethb9de241f (No such device) + * The work-around is to delete rogue interfaces with + ovs-vsctl del-port br0 <iface> + More info: + ovs-ofctl -O OpenFlow13 show br0 + ovs-ofctl -O OpenFlow13 dump-flows br0 + This does not solve the problem, however. The new interfaces will get abandoned by OpenShift. + * The issue is discussed here: + https://bugzilla.redhat.com/show_bug.cgi?id=1518684 + https://bugzilla.redhat.com/show_bug.cgi?id=1518912 + - After crashes / upgrades some pods may end up in 'Error' state. This is quite often happen to * kube-service-catalog/controller-manager * openshift-template-service-broker/api-server @@ -185,6 +201,8 @@ pods (failed pods, rogue namespaces, etc...) docker ps -aq --no-trunc | xargs docker rm + + Builds ====== - After changing storage for integrated docker registry, it may refuse builds with HTTP error 500. It is necessary |