summaryrefslogtreecommitdiffstats
path: root/docs/troubleshooting.txt
diff options
context:
space:
mode:
authorSuren A. Chilingaryan <csa@suren.me>2019-12-11 22:01:13 +0100
committerSuren A. Chilingaryan <csa@suren.me>2019-12-11 22:01:13 +0100
commite0b1b53f21095707af87a095934e971d788a90c7 (patch)
treef0a4e42e710d9895a5bf39e046873624c92b6d2e /docs/troubleshooting.txt
parentd9ce77409a6a3dc04e8b0808db81b7f3bea98492 (diff)
downloadands-e0b1b53f21095707af87a095934e971d788a90c7.tar.gz
ands-e0b1b53f21095707af87a095934e971d788a90c7.tar.bz2
ands-e0b1b53f21095707af87a095934e971d788a90c7.tar.xz
ands-e0b1b53f21095707af87a095934e971d788a90c7.zip
Document further possible problems with MySQL replication
Diffstat (limited to 'docs/troubleshooting.txt')
-rw-r--r--docs/troubleshooting.txt33
1 files changed, 31 insertions, 2 deletions
diff --git a/docs/troubleshooting.txt b/docs/troubleshooting.txt
index a47d133..fd57150 100644
--- a/docs/troubleshooting.txt
+++ b/docs/troubleshooting.txt
@@ -319,11 +319,40 @@ MySQL
=====
- MySQL may stop replicating from the master. There is some kind of deadlock in multi-threaded SLAVE SQL.
This can be seen by exexuting (which should show a lot of slave threads waiting on coordinator to provide
- load).
+ the load).
SHOW PROCESSLIST;
The remedy is to restart slave MySQL with 'slave_parallel_workers=0', give it a time to go, and then
- restart back in the standard multithreading mode.
+ restart back in the standard multithreading mode. This can be achieved by editing 'statefulset/mysql-slave-0'
+ and setting environmental vairable 'MYSQL_SLAVE_WORKERS' to 0 and, then, back to original value (16 currently).
+- This could be not end of this. The execution of statments from the log could 'stuck' because of the some "bad"
+ transaction. This can be detected by looking into the replication status on the slave database:
+ SHOW SLAVE STATUS\G
+ It may happen what while 'Retrieved_Gtid_Set' increments (binary log is successfully transfered from the master
+ and contains new transactions), the 'Executed_Gtid_Set' staying constant, i.e. new transactions are not applied
+ to the slave server. Also, in this case 'Relay_Log_Pos' and 'Exec_Master_Log_Pos' will not advance. And the
+ 'Slave_SQL_Running_State' will likely stuck in the 'Reading event from the relay log' state. The solution is to
+ skip misbehaving transaction. Basically, we need to skip problematic transaction and to continue from the next
+ one. The following considers that GTID_MODE is 'on' as we have it on the current KaaS platform.
+ * First, we need to find the next transaction. Likely it the just an increment of the 'Executed_Gtid_Set', i.e.
+ if 'Executed_Gtid_Set=3beaee24-2e55-11e8-9612-0a580a80000c:1-953787306' on the slave, the next GTID would be:
+ 'Executed_Gtid_Set=3beaee24-2e55-11e8-9612-0a580a80000c:953787307' (yes "1-" also goes away). But this is not
+ guaranteed. The more reliable way to find the misbehaving transaction is to look into the 'bin-log' on the master
+ server (information is taken from SHOW SLAVE STATUS executed on the slave). In '/var/lib/mysql/data' run
+ mysqlbinlog --start-position=<Exec_Master_Log_Pos> <Relay_Master_Log_File> | head -n 50
+ Here you will find which transaction has likely caused the problem. Furhermore, there will be the line looking like
+ SET @@SESSION.GTID_NEXT='4ab8feff-5272-11e8-9320-08002715584a:201840'
+ This is the gtid of the next transaction.
+ * So, the following commands should be executed on the slave MySQL server (see details, https://www.thegeekdiary.com/how-to-skip-a-transaction-on-mysql-replication-slave-when-gtids-are-enabled/)
+ SLAVE STOP;
+ SET @@SESSION.GTID_NEXT='<found_gtid_of_next_transaction>';
+ BEGIN;
+ COMMIT;
+ SET GTID_NEXT='AUTOMATIC';
+ SLAVE START;
+ * It is also possible to review the stuck transaction on the slave mysql node. In the '/var/lib/mysql/data' run
+ mysqlbinlog --start-position=<Relay_Log_Pos> <Relay_Log_File>
+
Administration
==============
- Some management tasks may require to login on ipekatrin* nodes. Thereafter, the password-less execution of