summaryrefslogtreecommitdiffstats
path: root/docs/troubleshooting.txt
diff options
context:
space:
mode:
authorSuren A. Chilingaryan <csa@suren.me>2018-07-05 06:29:09 +0200
committerSuren A. Chilingaryan <csa@suren.me>2018-07-05 06:29:09 +0200
commit2c3f1522274c09f7cfdb6309adc0719f05c188e9 (patch)
treee54e0c26f581543f48e945f186734e4bd9a8f15a /docs/troubleshooting.txt
parent8af0865a3a3ef783b36016c17598adc9d932981d (diff)
downloadands-2c3f1522274c09f7cfdb6309adc0719f05c188e9.tar.gz
ands-2c3f1522274c09f7cfdb6309adc0719f05c188e9.tar.bz2
ands-2c3f1522274c09f7cfdb6309adc0719f05c188e9.tar.xz
ands-2c3f1522274c09f7cfdb6309adc0719f05c188e9.zip
Update monitoring scripts to track leftover OpenVSwitch 'veth' interfaces and clean them up pereodically to avoid performance degradation, split kickstart
Diffstat (limited to 'docs/troubleshooting.txt')
-rw-r--r--docs/troubleshooting.txt18
1 files changed, 18 insertions, 0 deletions
diff --git a/docs/troubleshooting.txt b/docs/troubleshooting.txt
index ae43c52..9fa6f91 100644
--- a/docs/troubleshooting.txt
+++ b/docs/troubleshooting.txt
@@ -134,6 +134,22 @@ etcd (and general operability)
pods (failed pods, rogue namespaces, etc...)
====
+ - The 'pods' scheduling may fail on one (or more) of the nodes after long waiting with 'oc logs' reporting
+ timeout. The 'oc describe' reports 'failed to create pod sandbox'. This can be caused by failure to clean-up
+ after terminated pod properly. It causes rogue network interfaces to remain in OpenVSwitch fabric.
+ * This can be determined by errors reported using 'ovs-vsctl show' or present in the log '/var/log/openvswitch/ovs-vswitchd.log'
+ which may quickly grow over 100MB quickly.
+ could not open network device vethb9de241f (No such device)
+ * The work-around is to delete rogue interfaces with
+ ovs-vsctl del-port br0 <iface>
+ More info:
+ ovs-ofctl -O OpenFlow13 show br0
+ ovs-ofctl -O OpenFlow13 dump-flows br0
+ This does not solve the problem, however. The new interfaces will get abandoned by OpenShift.
+ * The issue is discussed here:
+ https://bugzilla.redhat.com/show_bug.cgi?id=1518684
+ https://bugzilla.redhat.com/show_bug.cgi?id=1518912
+
- After crashes / upgrades some pods may end up in 'Error' state. This is quite often happen to
* kube-service-catalog/controller-manager
* openshift-template-service-broker/api-server
@@ -185,6 +201,8 @@ pods (failed pods, rogue namespaces, etc...)
docker ps -aq --no-trunc | xargs docker rm
+
+
Builds
======
- After changing storage for integrated docker registry, it may refuse builds with HTTP error 500. It is necessary