作者所在电商平台通过Centos7.7 + Keepalive + Zabbix + DRBD + Heartbeat + MySQL + ES-Cluster 方案,构建了Zabbix的高可用集议论况。本文作者也在“Zabbix技能互换群”,欢迎加入互换。
【背景】由于公司业务环境Zabbix监控平台架构,无论在性能、稳定性还是版本升级方面都存在很大困难。本文将先容通过Centos7.7 + Keepalive + Zabbix + DRBD + Heartbeat + MySQL + ES-Cluster 方案,来构建Zabbix的高可用集议论况。
01 方案架构

02 环境初始化
环境信息初始化2节点ssh互通做免密认证
Zabbix-HA1节点实行 ssh-keygen -q -t rsa -N '' -f ~/.ssh/id_rsa ssh-copy-id -p36091 root@192.168.8.187
双节点做基于主机名解析,分别实行
cat > /etc/hosts << EOF
192.168.8.186 Zabbix-HA1
192.168.8.187 Zabbix-HA2
EOF
所有节点都关闭swapswapoff -a # 临时手动关闭
永久关闭swap ,注释掉/ etc/fstab 中swap ,重启机器
网络环境设定每个主机分别带有两块以太网卡,个中一块用于网络通信,另一块用于心跳功能。
两个节点的网络设置如下:
Zabbix-HA1 主节点
eth0: 192.168.8.86 255.255.0.0 #对外IP地址
eth1: 172.16.38.1 255.255.255.0 #HA心跳利用地址
Zabbix-HA2 从节点
eth0: 192.168.8.86 255.255.0.0 #对外IP地址
eth1: 172.16.38.2. 255.255.255.0 #HA心跳利用地址
配置heartbeat 防火墙规则在Zabbix-HA1 节点加入Zabbix-HA2 节点上heartbeat 心跳IP和udp 端口防火墙规则iptables -A INPUT -i eth1 -p udp -s 172.16.38.2 --dport 694 -m comment --comment "heartbeat-slave" -j ACCEPT/usr/libexec/iptables/iptables.init save在Zabbix-HA2 节点加入Zabbix-HA1 节点上heartbeat 心跳IP和udp 端口防火墙规则iptables -A INPUT -i eth1 -p udp -s 172.16.38.1 --dport 694 -m comment --comment "heartbeat-master" -j ACCEPT/usr/libexec/iptables/iptables.init save韶光同步韶光同步(zabbix-ha1 和zabbix-ha2 都要实行) yum -y install rdaterdate -s time-b.nist.gov03 安装配置DRBD
DRBD安装安装drbd9rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.orgrpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm yum install -y drbd90-utils kmod-drbd90配置drbd内核模块开机自动启动echo drbd > /etc/modules-load.d/drbd.conf手动启动drbd内核模块modprobe drbd lsmod|grep drbdps:不管是单独硬盘、普通分区、lvm,drbd须要的是干净的分区,不要格式化.配置drbd移除默认配置mv /etc/drbd.d/global_common.conf /etc/drbd.d/global_common.confbak创建全局配置cat << EOF > /etc/drbd.d/global_common.confglobal {usage-count no;}common { net { protocol C;}}EOF创建资源配置文件cat << EOF > /etc/drbd.d/drbd0.resresource drbd0 { disk / dev/sdb;device /dev/drbd0; meta-disk internal; on Zabbix-HA1 {address 192.168.8.186:7789;}on Zabbix-HA2 {address 192.168.8.187:7789;}}EOFps:根据实际情形修正上面配置中的主机名、IP、和disk为自己的详细配置
节点配置
Zabbix-HA1节点操作初始扮装备元文件drbdadm create-md drbd0启动drbd0资源drbdadm up drbd0Zabbix-HA2节点操作初始扮装备元文件drbdadm create-md drbd0启动drbd0资源drbdadm up drbd0在Zabbix-HA1节点,指定为主drbdadm primary --force drbd0在DRBD主节点年夜将drbd0块设备格式化,系统文件格式为xfs mkfs.xfs /dev/drbd0Zabbix-HA1节点上传drbd-overview掩护工具到/usr/sbin目录下,并赋权chmod +x /usr/sbin/drbd-overview复制到Zabbix-HA2节点/usr/sbin目录下scp -P36091 /usr/sbin/drbd-overview 192.168.8.187:/usr/sbin/分别查看两个节点上drbd0的主副角色 drbdadm role drbd0查看drbd同步运行状态cat /proc/drbdordrbd-overview0:drbd0/0 Connected Primary/Secondary UpToDate/UpToDate配置2个节点开机自动drbd做事systemctl enable drbd systemctl start drbd
04 安装配置Heartbeat
安装Heartbeat下载heartbeat 依赖环境包cd /usr/srcwget http://hg.linux-ha.org/heartbeat-STABLE_3_0/archive/958e11be8686.tar.bz2 wget http://hg.linux-ha.org/glue/archive/0a7add1d9996.tar.bz2wget https://github.com/ ClusterLabs/resource-agents/archive/v3.9.6.tar.gz安装干系依赖库yum -y install gcc gcc-c++ autoconf automake libtool glib2-devel libxml2-devel bzip2 bzip2-devel e2fsprogs-devel libxslt-devel libtool-ltdl-devel asciidoc psmisc创建运行heartbeat 组及用户groupadd haclientuseradd -g haclient hacluster -s /sbin/nologin支配heartbeat 编译安装依赖环境安装cluster-gluecd /usr/srctar -jxvf 0a7add1d9996.tar.bz2cd Reusable-Cluster-Components-glue--0a7add1d9996/./autogen.sh./configure --prefix=/usr/local/heartbeat --with-daemon-user=hacluster --with-daemon-group=haclient --enable-fatal-warnings=no LIBS='/lib64/libuuid.so.1' make && make install安装resource-agentstar -zxvf v3.9.6.tar.gzcd resource-agents-3.9.6./autogen.sh./configure --prefix=/usr/local/heartbeat --with-daemon-user=hacluster --with-daemon-group=haclient --enable-fatal-warnings=no LIBS='/lib64/libuuid.so.1' make && make install安装heartbeattar -jxvf 958e11be8686.tar.bz2cd Heartbeat-3-0-958e11be8686/./bootstrap声明编译环境变量export CFLAGS="$CFLAGS -I/usr/local/heartbeat/include -L/usr/local/heartbeat/lib"./configure --prefix=/usr/local/heartbeat --with-daemon-user=hacluster --with-daemon-group=haclient --enable-fatal-warnings=no LIBS='/lib64/libuuid.so.1' make && make install复制heartbeat 核心配置文件cp doc/{ha.cf,haresources,authkeys} /usr/local/heartbeat/etc/ha.d/授权认证文件600权限chmod 600 /usr/local/heartbeat/etc/ha.d/authkeys创建目录,配置网卡支持插件文件mkdir -p /usr/local/heartbeat/usr/lib/ocf/lib/heartbeat/cp /usr/lib/ocf/lib/heartbeat/ocf- /usr/local/heartbeat/usr/lib/ocf/lib/heartbeat/注:一样平常启动时会报错由于ping 和ucast 这些配置都须要插件支持,须要将lib64下面的插件软连接到lib目录才不会抛出非常 ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/RAExec/ /usr/local/heartbeat/lib/heartbeat/ plugins/RAExec/ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/ /usr/local/heartbeat/lib/heartbeat/plugins/Heartbeat配置sed -i 's/#auth 1/auth 1/g' /usr/local/heartbeat/etc/ha.d/authkeyssed -i 's/#1 crc/1 crc/g' /usr/local/heartbeat/etc/ha.d/authkeys2个节点分别复制drbd做事drbddisk脚本到heartbeat目录(编译安装heartbeat,默认没有该脚本),实现DRBD主从节点资源组的挂载和卸载cp-p/etc/ha.d/resource.d/drbddisk/usr/local/heartbeat/etc/ha.d/resource.d/配置haresources资源文件,用于指定双机系统的主节点、 VIP、子网掩码、广播地址及启动的做事等集群资源设置Zabbix-HA1为主从节点角色中为主节点,同时2节点分别实行echo 'Zabbix-HA1 IPaddr::192.168.8.4/24/eth0 drbddisk::drbd0 Filesystem::/dev/drbd0::/opt::xfs' >> /usr/local/heartbeat/etc/ha.d/haresources注:drbd0为drbd创建资源名称,一定要与其同等,否则无法用drbddisk脚本实现DRBD主从节点资源组的挂载和卸载
节点配置Heartbeat
分别在2节点配置heartbeat的主配置文件ha.cfmv /usr/local/heartbeat/etc/ha.d/ha.cf /usr/local/heartbeat/etc/ha.d/ha.cfbak Zabbix-HA1节点实行cat > /usr/local/heartbeat/etc/ha.d/ha.cf <<EOFdebugfile /var/log/ha-debug.log logfile /var/log/heartbeat.log ucast eth1 172.16.38.2keepalive 2warntime 6deadtime 10initdead 120udpport 694 auto_failback off node Zabbix-HA1 node Zabbix-HA2 ping 192.168.8.1respawn hacluster /usr/local/heartbeat/libexec/heartbeat/ipfail apiauth ipfail gid=haclient uid=haclusterEOFZabbix-HA2节点实行cat > /usr/local/heartbeat/etc/ha.d/ha.cf <<EOFdebugfile /var/log/ha-debug.log logfile /var/log/heartbeat.log ucast eth1 172.16.38.1keepalive 2warntime 6deadtime 10initdead 120udpport 694 auto_failback off node Zabbix-HA1 node Zabbix-HA2 ping 192.168.8.1respawn hacluster /usr/local/heartbeat/libexec/heartbeat/ipfail apiauth ipfail gid=haclient uid=haclusterEOF注:ucast eth1参数中配置IP地址为对方节点的心跳IP,否则heartbeat无法进行心跳检测和做事故障切换两机器启动heartbeat做事systemctl enable heartbeatsystemctl start heartbeat #重点:heartbeat开机启动顺序,先主节点启动,后从节点启动 systemctl stop heartbeat #实现主从节点之间切换及VIP漂移查看端口号netstat -anup|grep 694
05 配置MySQL高可用
从节点停滞zabbix-server做事和禁止开机自启动systemctl stop zabbix-server && systemctl disable zabbix-server从节点移除zabbix程序和webmkdir -p /data/backup && mv /opt/zabbix/ /data/backup && mv /opt/www_zabbix /data/backup/主节点禁用zabbix-server做事开机自启动,做事由heartbeat托管systemctl disable zabbix-server主节点都实行以下操作heartbeat接管zabbix-server做事启动停滞cat > /usr/local/heartbeat/etc/ha.d/resource.d/zabbix-server <<EOF #!/bin/bashPATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin export PATHcase "$1" in start)systemctl start zabbix-server;;stop)systemctl stop zabbix-server;;esac exit 0 EOF授予实行权限chmod +x /usr/local/heartbeat/etc/ha.d/resource.d/zabbix-server脚本拷贝到从节点scp -P36091 -p /usr/local/heartbeat/etc/ha.d/resource.d/zabbix-server 192.168.8.187:/usr/local/heartbeat/etc/ha.d/resource.d把zabbix-server脚本名称放入haresources文件中vim /usr/local/heartbeat/etc/ha.d/haresourcesZabbix-HA1 IPaddr::192.168.8.4/24/eth0 drbddisk::drbd0 Filesystem::/dev/drbd0::/opt::xfs zabbix-serverZabbix-HA1主节点拷贝到Zabbix-HA2从节点scp -P36091 /usr/local/heartbeat/etc/ha.d/haresources 192.168.8.187:/usr/local/heartbeat/ etc/ha.d/防止mysql开机启动优先于禁用大页做事,导致tokudb 引擎加载失落败(2个节点都要设置) systemctl disable mysqlecho 'systemctl start mysql' >> /etc/rc.localecho'systemctlstartkeepalived'>>/etc/rc.local
06 配置数据库双主同步
Zabbix-HA2节点备份zabbix干系配置表(有VIP地址节点导出)注:拷贝sql文件到Zabbix-HA1节点,导入sqlZabbix-HA1节点配置my.cnf 忽略要同步的历史大表############Open GTID Mode#########gtid_mode = on enforce_gtid_consistency =true log_slave_updates=truemaster-info-repository=TABLE relay-log-info-repository=TABLE slave-parallel-workers=4###########MySQL AB Replication##########relay-log = /data/mysql/relay-log auto-increment-increment = 2auto-increment-offset = 1replicate-wild-ignore-table=zabbix.history replicate-wild-ignore-table=zabbix.history_uint replicate-wild-ignore-table=zabbix.history_str replicate-wild-ignore-table=zabbix.history_log replicate-wild-ignore-table=zabbix.history_textZabbix-HA2节点配置my.cnf############Open GTID Mode#########gtid_mode = on enforce_gtid_consistency =true log_slave_updates=truemaster-info-repository=TABLE relay-log-info-repository=TABLE slave-parallel-workers=4###########MySQL AB Replication##########relay-log = /data/mysql/relay-log auto-increment-increment = 2auto-increment-offset = 2replicate-wild-ignore-table=zabbix.history replicate-wild-ignore-table=zabbix.history_uint replicate-wild-ignore-table=zabbix.history_str replicate-wild-ignore-table=zabbix.history_log replicate-wild-ignore-table=zabbix.history_text注:主从复制各个实例server-id必须唯一,分别重启mysql做事生效分别在2个主节点上设置主从复制在Zabbix-HA1节点和Zabbix-HA2节点上分别创建具有复制权限的用户grant replication slave on . to repl@'192.168.8.%' identified by 'yanghui';flush privileges;Zabbix-HA1节点授权许可Zabbix-HA2同步干系的表,才能开启start slave;)change master to master_host='192.168.8.187',master_user='repl',master_password='yanghui',master_auto_position=1;flush privileges;Zabbix-HA2节点授权(许可从Zabbix-HA1同步干系的表,才能开启start slave;)change master to master_host='192.168.8.186',master_user='repl',master_password='yanghui',master_auto_position=1;flush privileges;双节点分别授权许可通过VIP地址远程上岸MySQL Zabbix-HA1节点节点实行(8.186)grant all on . to root@'192.168.8.187' identified by 'Zabbix@2021'; select user,host,password from mysql.user;Zabbix-HA2节点实行(8.187)grantallon.toroot@'192.168.8.186'identifiedby'Zabbix@2021';selectuser,host,passwordfrommysql.user;分别上岸MySQL主从节点启动主从复制start slave;show slave status\G;drbd挂载的节点变动zabbix-web的IP地址为VIP地址/opt/www_zabbix/conf/zabbix.conf.phpsed -i 's/192.168.8.186/192.168.8.5/g' /opt/www_zabbix/conf/zabbix.conf.php sed -i 's/192.168.8.186/192.168.8.5/g ' /opt/zabbix/etc/zabbix/zabbix_server.conf注:表示绿色为宿主机节点IP,赤色为数据库的VIP,需重启zabbix-server做事
07 测试验证
测试HA架构方案-做事主节点VIP漂移测试Zabbix-Server + DRBD +heartbeat HA
systemctl restart heartbeat
#不雅观察从节点是否接管Zabbix-Server VIP地址,并且做事是否正常运行
注:Zabbix-HA1节点第1次切换到Zabbix-HA2节点后,zabbix-server没有正常启动,需systemctl restart zabbix-server,后续切换就能自动启动。
测试mysql + keepalived HAsystemctl stop mysql
#不雅观察从节点是否接管MySQL VIP地址,并且做事是否正常运行
注:故障节点做事切换后,修复该节点做事后,需手动启动keepalived做事来连续监听对方节点,形成HA机制
08 完善方案
keepalived+mysql HA方案完善补充缘由:利用脚本和定时任务监测keepalived 做事运行状态,实现mysql 故障切换后,keepalived 通过后台定时任务实现自动规复做事。keepalived脚本分别放到2个节点的/etc/keepalived目录echo '/2 root /etc/keepalived/monitor-keepalived.sh' >> /etc/crontab systemctl restart crond注:修正定时任务调用脚本后,一定要重启crond任务才会生效heartbeat+zabbix HA方案完善补充缘由:heartbeat本身不对运用做事状态做判断,须要编写脚本来判断做事非常,填补该架构的不敷。支配supervisoryum install -y supervisormkdir -p /etc/supervisor/config.decho_supervisord_conf > /etc/supervisor/supervisord.conf配置supervisord主配置文件cat >> /etc/supervisor/supervisord.conf <<EOF [include]files = /etc/supervisor/config.d/.ini EOF配置监视zabbix-server进程cat > /etc/supervisor/config.d/zabbix-server.ini <<EOF [program:zabbix-heartbeat]user=rootdirectory= /etc/supervisorcommand= /bin/sh /etc/supervisor/heartbeat.sh numprocs=1autostart=true autorestart=true startretries=3 EOF启动supervisord做事systemctl enable supervisord && systemctl start supervisord && systemctl status supervisord补充:heartbeat编译参数表明:vim /usr/local/heartbeat/etc/ha.d/ha.cfdebugfile /var/log/ha-debug ##用于记录heartbeat的调试信息logfile/var/log/ha-log ##用于记录heartbeat的日志信息logfacilitylocal0 ##设置heartbeat的日志,这里用的是系统日志keepalive 2 ##设定心跳(监测)韶光韶光为2秒deadtime 30 ##指定若备用节点在30秒内未收到主节点心跳旗子暗记,则接管主理事器资源warntime 10 ##指定心跳延迟的韶光为10秒,10秒内备节点不能吸收主节点心跳旗子暗记,即昔日记写入警告日志,但不会切换做事initdead 60 ##系统启动或重启后预留的忽略韶光段,取值至少为deadtime的两倍udpport 694 ##广播/单播通讯利用的Udp端口#bcast ens32 ##利用网卡eno32发送心跳检测#mcast eth0 225.0.0.1 694 1 0 ##采取网卡eth0的Udp多播来组织心跳,一样平常在备用节点ucast ens32 192.168.1.64##采取网卡eth32的udp单播来组织心跳,后面跟的IP地址为双机对方IP地址auto_failbackon##定义当主节点规复后,是否将做事自动切回,争抢VIP地址nodexuegod63.cnnodexuegod64.cnping192.168.1.1##主节点名称##备用节点名称##通过ping网关检测心跳是否正常,仅用来测试网络apiauth ipfail gid=haclient uid=hacluster ##设置启动IPfail的用户和组respawn hacluster /usr/local/heartbeat/libexec/heartbeat/ipfail ##指定和heartbeat一起启动、关闭的进程注:Bcast、ucast和mcast分别代表广播、单播和多播,是组织心跳的的办法,任选其一
08 总结
优点:
安全性高、稳定性高、可用性高,涌现故障自动切换缺陷:
只有一台做事器供应做事,成本相对较高,未便利扩展,可能会发生脑裂当zabbix 做事挂掉或者不可用的情形下不能进行自动切换,须要通过的脚本实现(比如shell脚本监测到master 的zabbix 不可用就将主节点上的heartbeat 停掉,这样就会切换到从节点去)危险操作:
不能在主从节点停滞drbd做事, systemctl stop drbd会造成DRBD脑裂,主从节点数据不一致手动启停MySQL做事只能通过systemctl statt/stop mysql操作,禁止利用/etc/init.d/mysqld stop/start操作,防止MySQL PID非常,做事不可用09 FAQ
故障征象:当DRBD涌现脑裂后,会导致drbd 主从两边的磁盘数据不一致,从节点上切换成secondary ,并放弃该资源的数据从节点实行以下命令:drbdadm secondary r0
drbdadm --discard-my-data connect r0
故障征象:primary 主节点重新连接secondary (如果这个节点当前的连接状态为WFConnection 的话)主节点实行以下命令:drbdadm connect r0故障征象:# drbdadm create-md r0 'r0' not defined in your config (for this host).缘故原由:A. 主机名与资源池(.res) 中配置定义主机名不一致导致
B. 资源池(.res) 中配置定义资源池名字与资源池(.res) 不一致导致
办理方法:统一名称或者主机名即可。