分类: MySQL
CentOS7.4搭建MHA环境

MHA(Master HA)是一款开源的MySQL的高可用工具,能在MySQL主从复制的基础上,实现自动化主服务器故障转移。虽然MHA试图从宕机的主服务器上保存二进制日志,但并不是总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失最新数据。

MHA监控复制架构的主服务器,一旦检测到主服务器故障,就会自动进行故障转移。即使有些从服务器没有收到最新的relay log,MHA自动从最新的从服务器上识别差异的relay log并把这些日志应用到其他从服务器上,因此所有的从服务器保持一致性了。还可以设置优先级指定其中的一台slave作为master的候选,由于MHA在slaves之间修复一致性,因此可以将slave变成新的master,其他的slave都以其作为新master。
MHA集群架构
目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,因为至少需要三台服务器,阿里也在该基础上进行了改造,目前淘宝TMHA已经支持一主一从。
MHA作用:

1)从宕机崩溃的master保存二进制日志事件(binlog events);

2)识别含有最新更新的slave;

3)应用差异的中继日志(relay log)到其他的slave;

4)应用从master保存的二进制日志事件(binlog events);

5)提升一个slave为新的master;

6)使其他的slave连接新的master进行复制;

MHA有两部分组成,MHA Manager(管理节点)和MHA Node(数据节点):

1:MHA Manager可以单独部署在一台独立机器上管理多个master-slave集群,也可以部署在一台slave上。MHA Manager探测集群的node节点,当master出现故障的时它可以自动将具有最新数据的slave提升为新的master,然后将所有其它的slave导向新的master上。整个故障转移过程对应用程序是透明的。
2:MHA node运行在每台MySQL服务器上(master/slave/manager),它通过监控具备解析和清理logs功能的脚本来加快故障转移的。

Manager工具包主要包括以下几个工具:

masterha_check_ssh              检查MHA的SSH配置状况
masterha_check_repl             检查MySQL复制状况
masterha_manger                 启动MHA
masterha_check_status           检测当前MHA运行状态
masterha_master_monitor         检测master是否宕机
masterha_master_switch          控制故障转移(自动或者手动)
masterha_conf_host              添加或删除配置的server信息
Node工具包(这些工具通常由MHA Manager的脚本触发,无需人为操作)主要包括以下几个工具:

save_binary_logs                保存和复制master的二进制日志
apply_diff_relay_logs           识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog              去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs                清除中继日志(不会阻塞SQL线程)

MHA搭建步骤如下:
1、系统初始化
防火墙和selinux都关闭,修改时区、校验时间

systemctl stop firewall
systemctl stop firewalld
cat /etc/selinux/config
SELINUX=disabled
SELINUXTYPE=targeted 
[root@master tmp]# getenforce 
Disabled
[root@manager ~]# crontab -l
*/30 * * * * /usr/sbin/ntpdate pool.ntp.org
[root@manager ~]# date -R
Mon, 22 Jan 2018 20:28:29 +0800

MySQL安装参见:Centos7.3编译安装MySQL 5.7.17

2、根据实际场景修改主机名,并加入/etc/hosts中

192.168.121.163     master
192.168.121.165     slave1
192.168.121.166     slave2
192.168.121.169    manager

用ssh-keygen实现四台主机之间相互免密钥登录:
生成密钥,四台机器都依次操作
[master,slave1,slave2,manager]

cd /root
ssh-keygen -t rsa 
scp .ssh/id_rsa.pub master:/root/.ssh/master.pub 
scp .ssh/id_rsa.pub master:/root/.ssh/slave1.pub
scp .ssh/id_rsa.pub master:/root/.ssh/manager.pub

导入公钥

cat ~/.ssh/*.pub >> ~/.ssh/authorized_keys
scp ~/.ssh/authorized_keys slave1:/root/.ssh/authorized_keys 
scp ~/.ssh/authorized_keys slave2:/root/.ssh/authorized_keys 
scp ~/.ssh/authorized_keys manager:/root/.ssh/authorized_keys 

最后实现直接ssh hostname即可登录主机;
3、安装MHAmha4mysql-node,mha4mysql-manager 软件包:

yum install epel-release -y 
yum install perl cpan perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y
wget https://downloads.mariadb.com/MHA/mha4mysql-node-0.54-0.el6.noarch.rpm
wget https://downloads.mariadb.com/MHA/mha4mysql-manager-0.55-0.el6.noarch.rpm
rpm -ivh mha4mysql-node-0.54-0.el6.noarch.rpm
rpm -ivh mha4mysql-manager-0.55-0.el6.noarch.rpm

4、实现master,slave1,slave2之间主从复制

参见:MySQL5.7.18基于GTID的主从复制过程实现

5、管理机manager上配置MHA文件
1.创建目录

mkdir -p /masterha/app1
mkdir /etc/masterha
vim /etc/masterha/default.cnf
[server default]
user=root
password=123456
manager_workdir=/masterha/app1
manager_log=/masterha/app1/manager.log
remote_workdir=/masterha/app1
ssh_user=root
repl_user=repl
repl_password=123456
ping_interval=1  #设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行railover
shutdown_script="" //设置故障发生后关闭故障主机脚本(该脚本的主要作用是关闭主机放在发生脑裂,这里没有使用)(可省略)
master_ip_online_change_script="" //设置手动切换时候的切换脚本(可省略)
report_script="/usr/bin/masterha_report_script" //设置发生切换后发送的报警的脚本(可省略)
master_ip_failover_script="/usr/bin/masterha_ip_failover" //设置自动failover时候的切换脚本(可省略)
[server1]
hostname=slave1
master_binlog_dir=/data/mysql/binlog
candidate_master=1 //设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中时间最新的slave

#relay_log_purge=0
[server2]
hostname=master
master_binlog_dir=/data/mysql/binlog
candidate_master=1
[server3]
hostname=slave2
master_binlog_dir=/data/mysql/binlog
no_master=1

6、masterha_check_ssh工具验证ssh信任登录是否成功
[manager]

masterha_check_ssh --conf=/etc/masterha/default.cnf

Mon Jan 22 20:48:59 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jan 22 20:48:59 2018 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Mon Jan 22 20:48:59 2018 - [info] Reading server configurations from /etc/masterha/default.cnf..
Mon Jan 22 20:48:59 2018 - [info] Starting SSH connection tests..
Mon Jan 22 20:49:00 2018 - [debug] 
Mon Jan 22 20:48:59 2018 - [debug]  Connecting via SSH from root@slave1(192.168.121.165:22) to root@master(192.168.121.163:22)..
Mon Jan 22 20:48:59 2018 - [debug]   ok.
Mon Jan 22 20:48:59 2018 - [debug]  Connecting via SSH from root@slave1(192.168.121.165:22) to root@slave2(192.168.121.166:22)..
Mon Jan 22 20:49:00 2018 - [debug]   ok.
Mon Jan 22 20:49:00 2018 - [debug] 
Mon Jan 22 20:48:59 2018 - [debug]  Connecting via SSH from root@master(192.168.121.163:22) to root@slave1(192.168.121.165:22)..
Mon Jan 22 20:49:00 2018 - [debug]   ok.
Mon Jan 22 20:49:00 2018 - [debug]  Connecting via SSH from root@master(192.168.121.163:22) to root@slave2(192.168.121.166:22)..
Mon Jan 22 20:49:00 2018 - [debug]   ok.
Mon Jan 22 20:49:01 2018 - [debug] 
Mon Jan 22 20:49:00 2018 - [debug]  Connecting via SSH from root@slave2(192.168.121.166:22) to root@slave1(192.168.121.165:22)..
Mon Jan 22 20:49:00 2018 - [debug]   ok.
Mon Jan 22 20:49:00 2018 - [debug]  Connecting via SSH from root@slave2(192.168.121.166:22) to root@master(192.168.121.163:22)..
Mon Jan 22 20:49:01 2018 - [debug]   ok.
Mon Jan 22 20:49:01 2018 - [info] All SSH connection tests passed successfully.

7、masterha_check_repl工具验证mysql复制是否成功
[manager]

masterha_check_repl --conf=/etc/masterha/default.cnf

Mon Jan 22 20:50:27 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jan 22 20:50:28 2018 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Mon Jan 22 20:50:28 2018 - [info] Reading server configurations from /etc/masterha/default.cnf..
Mon Jan 22 20:50:28 2018 - [info] MHA::MasterMonitor version 0.55.
Mon Jan 22 20:50:29 2018 - [info] Multi-master configuration is detected. Current primary(writable) master is master(192.168.121.163:3306)
Mon Jan 22 20:50:29 2018 - [info] Master configurations are as below: 
Master slave1(192.168.121.165:3306), replicating from 192.168.121.163(192.168.121.163:3306), read-only
Master master(192.168.121.163:3306), replicating from 192.168.121.165(192.168.121.165:3306)

Mon Jan 22 20:50:29 2018 - [info] Dead Servers:
Mon Jan 22 20:50:29 2018 - [info] Alive Servers:
Mon Jan 22 20:50:29 2018 - [info]   slave1(192.168.121.165:3306)
Mon Jan 22 20:50:29 2018 - [info]   master(192.168.121.163:3306)
Mon Jan 22 20:50:29 2018 - [info]   slave2(192.168.121.166:3306)
Mon Jan 22 20:50:29 2018 - [info] Alive Slaves:
Mon Jan 22 20:50:29 2018 - [info]   slave1(192.168.121.165:3306)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Mon Jan 22 20:50:29 2018 - [info]     Replicating from 192.168.121.163(192.168.121.163:3306)
Mon Jan 22 20:50:29 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Jan 22 20:50:29 2018 - [info]   slave2(192.168.121.166:3306)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Mon Jan 22 20:50:29 2018 - [info]     Replicating from 192.168.121.163(192.168.121.163:3306)
Mon Jan 22 20:50:29 2018 - [info]     Not candidate for the new Master (no_master is set)
Mon Jan 22 20:50:29 2018 - [info] Current Alive Master: master(192.168.121.163:3306)
Mon Jan 22 20:50:29 2018 - [info] Checking slave configurations..
Mon Jan 22 20:50:29 2018 - [warning]  relay_log_purge=0 is not set on slave slave1(192.168.121.165:3306).
Mon Jan 22 20:50:29 2018 - [warning]  relay_log_purge=0 is not set on slave slave2(192.168.121.166:3306).
Mon Jan 22 20:50:29 2018 - [info] Checking replication filtering settings..
Mon Jan 22 20:50:29 2018 - [info]  binlog_do_db= , binlog_ignore_db= 
Mon Jan 22 20:50:29 2018 - [info]  Replication filtering check ok.
Mon Jan 22 20:50:29 2018 - [info] Starting SSH connection tests..
Mon Jan 22 20:50:31 2018 - [info] All SSH connection tests passed successfully.
Mon Jan 22 20:50:31 2018 - [info] Checking MHA Node version..
Mon Jan 22 20:50:32 2018 - [info]  Version check ok.
Mon Jan 22 20:50:32 2018 - [info] Checking SSH publickey authentication settings on the current master..
Mon Jan 22 20:50:32 2018 - [info] HealthCheck: SSH to master is reachable.
Mon Jan 22 20:50:33 2018 - [info] Master MHA Node version is 0.54.
Mon Jan 22 20:50:33 2018 - [info] Checking recovery script configurations on the current master..
Mon Jan 22 20:50:33 2018 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/binlog --output_file=/masterha/app1/save_binary_logs_test --manager_version=0.55 --start_file=binlog.000024 
Mon Jan 22 20:50:33 2018 - [info]   Connecting to root@master(master).. 
  Creating /masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /data/mysql/binlog, up to binlog.000024
Mon Jan 22 20:50:33 2018 - [info] Master setting check done.
Mon Jan 22 20:50:33 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Mon Jan 22 20:50:33 2018 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=slave1 --slave_ip=192.168.121.165 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.7.21-log --manager_version=0.55 --relay_log_info=/data/mysql/relay-log.info  --relay_dir=/data/mysql/  --slave_pass=xxx
Mon Jan 22 20:50:33 2018 - [info]   Connecting to root@192.168.121.165(slave1:22).. 
  Checking slave recovery environment settings..
    Opening /data/mysql/relay-log.info ... ok.
    Relay log found at /data/mysql/binlog, up to mysql-relay-bin.000002
    Temporary relay log file is /data/mysql/binlog/mysql-relay-bin.000002
    Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Mon Jan 22 20:50:34 2018 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=slave2 --slave_ip=192.168.121.166 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.7.21-log --manager_version=0.55 --relay_log_info=/data/mysql/relay-log.info  --relay_dir=/data/mysql/  --slave_pass=xxx
Mon Jan 22 20:50:34 2018 - [info]   Connecting to root@192.168.121.166(slave2:22).. 
  Checking slave recovery environment settings..
    Opening /data/mysql/relay-log.info ... ok.
    Relay log found at /data/mysql/binlog, up to mysql-relay-bin.000003
    Temporary relay log file is /data/mysql/binlog/mysql-relay-bin.000003
    Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Mon Jan 22 20:50:34 2018 - [info] Slaves settings check done.
Mon Jan 22 20:50:34 2018 - [info] 
master (current master)
 +--slave1
 +--slave2

Mon Jan 22 20:50:34 2018 - [info] Checking replication health on slave1..
Mon Jan 22 20:50:34 2018 - [info]  ok.
Mon Jan 22 20:50:34 2018 - [info] Checking replication health on slave2..
Mon Jan 22 20:50:34 2018 - [info]  ok.
Mon Jan 22 20:50:34 2018 - [warning] master_ip_failover_script is not defined.
Mon Jan 22 20:50:34 2018 - [warning] shutdown_script is not defined.
Mon Jan 22 20:50:34 2018 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

8、启动MHA manager,并监控日志文件

masterha_manager --conf=/etc/masterha/default.cnf 
tail -f /masterha/app1/manager.log 

9、测试master(宕机后,是否会自动切换

[master]

[root@master ~]# service mysql stop
Shutting down MySQL..... SUCCESS! 

宕掉master后,/masterha/app1/manager.log文件显示:

tail -f /masterha/app1/manager.log 
......
Generating relay diff files from the latest slave succeeded.
slave2: OK: Applying all logs succeeded. Slave started, replicating from slave1.
slave1: Resetting slave info succeeded.
Master failover to slave1(192.168.121.165:3306) completed successfully. 

上面的结果表明master成功切换。


相关博文:

发表新评论