We have a number of customer using MySQL replication. For the uninitiated, replication copies changes from a master server to a number of slave servers. This makes scaling your service mush easier and helps improve redundancy and failover. An introduction to setting that up is available in our Mysql Replication guide.
Once you have replication running it is easy to check if the slave servers are keeping up, and not reporting any errors. For example on one of the slave servers you might run…
SHOW SLAVE STATUS\G
You can also use a small script to automate checking that, and to send you alerts if there is a problem, using crontab. As in the following example.
#!/usr/bin/perl use DBI; my $DSN='DBI:mysql:mysql:localhost'; my $DBUSER='mysqdbuser'; my $DBPASSWD='mydbpassword'; # when the backup is running, replication regularly gets over 2500 seconds behind. this is fine. my $MAXTIME = 3000; my $dbh=DBI->connect($DSN, $DBUSER, $DBPASSWD) or die print "$!\n"; my $sql='SHOW SLAVE STATUS'; my $sth=$dbh->prepare($sql); $sth->execute(); while(my $info=$sth->fetchrow_hashref()){ if (exists $info->{Slave_SQL_Running} and $info->{Slave_SQL_Running} eq 'No') { warn "slave SQL thread has stopped\n"; } elsif (exists $info->{Slave_IO_Running} and $info->{Slave_IO_Running} eq 'No') { warn "slave IO thread has stopped\n"; } elsif (exists $info->{Slave_Running} and $info->{Slave_Running} eq 'No') { warn "slave has stopped\n"; } elsif (exists $info->{Seconds_Behind_Master} and $info->{Seconds_Behind_Master} >= $MAXTIME ) { warn "Replication $info->{Seconds_Behind_Master} seconds old\n"; } }
Often this is good enough, however as your databases starts using GBs of storage, you may want to check individual tables to insure they are stay 100% correct. For example, in case a developer alters something in one of the slaves by mistake, or there could be hidden disk corruption. You might not notice that until replication breaks trying to change the affected data, or a web page that depends on that shows unexpected results.
There are a few different ways to fix that, but really what is most important is actually being aware when and where a problem is occurring.
Percona provide a nice set of tools that essentially records checksums of replicated tables on the master server, and then copies that info to the slaves so the contents of tables can be compared quickly. In particular the pt-table-checksum and pt-table-sync tools are most useful.
If you are interested in leveraging these tools and need a hand getting setup, if you are starting out with replication or have a mature setup already, let us know and we can advise you. Pop a ticket in at https://rimuhosting.com/ticket/startticket.jsp with details on how we can help.