基于DRBD+corosync对MariaDB做高可用集群

浏览数：50 / 时间：2015年06月12日

一、DRBD简介

DRBD的全称为：Distributed ReplicatedBlock Device(DRBD)分布式块设备复制,DRBD是由内核模块和相关脚本而构成，用以构建高可用性的集群。其实现方式是通过网络来镜像整个设备。你可以把它看作是一种网络RAID。它允许用户在远程机器上建立一个本地块设备的实时镜像。

1.下载rpm包

目前适用CentOS 5的drbd版本主要有8.0、8.2、8.3三个版本，其对应的rpm包的名字分别为drbd, drbd82和drbd83，对应的内核模块的名字分别为kmod-drbd, kmod-drbd82和kmod-drbd83。而适用于CentOS 6的版本为8.4，其对应的rpm包为drbd和drbd-kmdl，但在实际选用时，要切记两点：drbd和drbd-kmdl的版本要对应；另一个是drbd-kmdl的版本要与当前系统的内容版本相对应。各版本的功能和配置等略有差异；我们实验所用的平台为x86_64且系统为CentOS 6.5，因此需要同时安装内核模块和管理工具。我们这里选用最新的8.4的版本(drbd-8.4.3-33.el6.x86_64.rpm和drbd-kmdl-2.6.32-431.el6-8.4.3-33.el6.x86_64.rpm

2.准备工作

两个节点的节点的名称和对应的ip地址解析服务应该能正常工作

3.安装，由于drbd包没有依赖关系，可以直接使用rpm安装

[root@node1 ~]# rpm -ivh drbd-8.4.3-33.el6.x86_64.rpm drbd-kmdl-2.6.32-431.el6-8.4.3-33.el6.x86_64.rpm
[root@node2 ~]# rpm -ivh drbd-8.4.3-33.el6.x86_64.rpm drbd-kmdl-2.6.32-431.el6-8.4.3-33.el6.x86_64.rpm

重新启动rsyslog服务

4.配置说明

drbd的主配置文件为/etc/drbd.conf；为了管理的便捷性，目前通常会将些配置文件分成多个部分，且都保存至/etc/drbd.d目录中，主配置文件中仅使用"include"指令将这些配置文件片断整合起来。通常，/etc/drbd.d目录中的配置文件为global_common.conf和所有以.res结尾的文件。其中global_common.conf中主要定义global段和common段，而每一个.res的文件用于定义一个资源。

在配置文件中，global段仅能出现一次，且如果所有的配置信息都保存至同一个配置文件中而不分开为多个文件的话，global段必须位于配置文件的最开始处。目前global段中可以定义的参数仅有minor-count, dialog-refresh, disable-ip-verification和usage-count。

common段则用于定义被每一个资源默认继承的参数，可以在资源定义中使用的参数都可以在common段中定义。实际应用中，common段并非必须，但建议将多个资源共享的参数定义为common段中的参数以降低配置文件的复杂度。

resource段则用于定义drbd资源，每个资源通常定义在一个单独的位于/etc/drbd.d目录中的以.res结尾的文件中。资源在定义时必须为其命名，名字可以由非空白的ASCII字符组成。每一个资源段的定义中至少要包含两个host子段，以定义此资源关联至的节点，其它参数均可以从common段或drbd的默认中进行继承而无须定义。

5.配置文件/etc/drbd.d/global_common.conf

global {
usage-count no; //drbd官网用来统计drbd的使用数据的
# minor-count dialog-refresh disable-ip-verification
}
common { //提供共享配置
handlers { //处理器，在特定的环境下执行的命令
# These are EXAMPLE handlers only.
# They may have severe implications,
# like hard resetting the node under certain circumstances.
# Be careful when chosing your poison.
pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; //在主节点降级的情况下要执行的命令
pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; //在备节点接替主节点前，对主节点的操作
local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; 当本地发送io错误时的操作
# fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
# split-brain "/usr/lib/drbd/notify-split-brain.sh root";
# out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
# before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
# after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
# after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
}
startup { //可以不配置，使用默认即可
# wfc-timeout 等待对端上线的超时时间
# degr-wfc-timeout 降级的超时时间
#outdated-wfc-timeout 过期的等待超时时间
#wait-after-sb 脑裂的等待时间
}
options { //可以不配置使用默认即可
# cpu-mask on-no-data-accessible
}
disk {
# size max-bio-bvecs on-io-error fencing disk-barrier disk-flushes
# disk-drain md-flushes resync-rate resync-after al-extents
# c-plan-ahead c-delay-target c-fill-target c-max-rate
# c-min-rate disk-timeout
on-io-error 当发生io错误是，应该要做的操作，有三个选项，pass_on：降级当前节点；call-local-io-error：执行本机的io-error操作；detach：将磁盘拆掉
}
net {
protocol C 协议版本
cram-hmac-alg "sha1"
shared-secret "kjasdbiu2178uwhbj"
# protocol timeout max-epoch-size max-buffers unplug-watermark
# connect-int ping-int sndbuf-size rcvbuf-size ko-count
# allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri
# after-sb-1pri after-sb-2pri always-asbp rr-conflict
# ping-timeout data-integrity-alg tcp-cork on-congestion
syncer{
rate 1000M 同步的速率
}
}

6.准备磁盘设备，双方节点都需要准备，最好是能等同大小，编号最好也能一样.

[root@node2 ~]# fdisk /dev/sda
WARNING: DOS-compatible mode is deprecated. It‘s strongly recommended to
switch off the mode (command ‘c‘) and change display units to
sectors (command ‘u‘).
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 3
First cylinder (7859-15665, default 7859):
Using default value 7859
Last cylinder, +cylinders or +size{K,M,G} (7859-15665, default 15665): +10G
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table. The new table will be used at
the next reboot or after you run partprobe(8) or kpartx(8)
Syncing disks.
[root@node2 ~]# kpartx -af /dev/sda
device-mapper: reload ioctl on sda1 failed: Invalid argument
create/reload failed on sda1
device-mapper: reload ioctl on sda2 failed: Invalid argument
create/reload failed on sda2
device-mapper: reload ioctl on sda3 failed: Invalid argument
create/reload failed on sda3
[root@node2 ~]# partx -a /dev/sda
BLKPG: Device or resource busy
error adding partition 1
BLKPG: Device or resource busy
error adding partition 2
BLKPG: Device or resource busy
error adding partition 3

7.准备资源配置文件，如drbd.re

resource drbd {
on node1.zero1.com { 用on来标志节点名称
device /dev/drbd0；指定drbd的编号
disk /dev/sda3；磁盘分区编号
address 192.168.1.200:7789；监听的套接字
meta-disk internal；对于原始数据的处理，此处为保持在本磁盘
}
on node2.zero1.com {
device /dev/drbd0；
disk /dev/sda3；
address 192.168.1.201:7789；
meta-disk internal；
}
}

8.将数据复制到节点

[root@node1 drbd.d]# scp * node2:/etc/drbd.d/

七、初始化测试

1.创建资源

[root@node1 drbd.d]# drbdadm create-md drbd
Writing meta data...
initializing activity log
NOT initializing bitmap
lk_bdev_save(/var/lib/drbd/drbd-minor-0.lkbd) failed: No such file or directory
New drbd meta data block successfully created. //创建成功
lk_bdev_save(/var/lib/drbd/drbd-minor-0.lkbd) failed: No such file or directory

2.启动服务

[root@node2 ~]# service drbd start
Starting DRBD resources: [
create res: drbd
prepare disk: drbd
adjust disk: drbd
adjust net: drbd
]

两边要同时启动

3.查看启动状态

[root@node2 ~]# cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by gardner@, 2013-11-29 12:28:00
0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:10489084
也可以通过drbd-overview命令来查看
[root@node2 ~]# drbd-overview
0:drbd/0 Connected Secondary/Secondary Inconsistent/Inconsistent C r-----

从上面的信息中可以看出此时两个节点均处于Secondary状态。

4.提升节点

[root@node1 drbd.d]# drbdadm primary --force drbd
[root@node1 drbd.d]# drbd-overview
0:drbd/0 SyncSource Primary/Secondary UpToDate/Inconsistent C r-----
[>....................] sync‘ed: 4.8% (9760/10240)M

可以看到正在同步数据，等数据同步完再查看

[root@node1 ~]# cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by gardner@, 2013-11-29 12:28:00
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:10490880 nr:0 dw:0 dr:10496952 al:0 bm:641 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

5.创建文件系统

文件系统的挂载只能在Primary节点进行，因此，也只有在设置了主节点后才能对drbd设备进行格式化：

[root@node1 ~]# mke2fs -t ext4 /dev/drbd0
[root@node1 ~]# mkdir /drbd
[root@node1 ~]# mount /dev/drbd0 /drbd/

6.切换Primary和Secondary节点

对主Primary/Secondary模型的drbd服务来讲，在某个时刻只能有一个节点为Primary，因此，要切换两个节点的角色，只能在先将原有的Primary节点设置为Secondary后，才能原来的Secondary节点设置为Primary：

主节点

[root@node1 ~]# cp /etc/fstab /drbd/
[root@node1 ~]# umount /drbd/
[root@node1 ~]# drbdadm secondary drbd

备节点

[root@node2 ~]# drbdadm primary drbd
[root@node2 ~]# mkdir /drbd
[root@node2 ~]# mount /dev/drbd0 /drbd/
[root@node2 ~]# ll /drbd/
total 20
-rw-r--r-- 1 root root 921 Apr 19 2014 fstab
drwx------ 2 root root 16384 Apr 19 2014 lost+found

可以看到我们的数据在节点2上可以看到了

[root@node2 ~]# cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by gardner@, 2013-11-29 12:28:00
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:24 nr:10788112 dw:10788136 dr:1029 al:2 bm:642 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

可以看到现在node2为主节点

八、corosync的安装与配置

安装corosync。

2.crm的安装

[root@node1 ~]# yum install -y crmsh-1.2.6-4.el6.x86_64.rpm pssh-2.3.1-2.el6.x86_64.rpm
[root@node2 ~]# yum install -y crmsh-1.2.6-4.el6.x86_64.rpm pssh-2.3.1-2.el6.x86_64.rpm

3.定义drbd资源

[root@node1 ~]# crm
crm(live)# configure
crm(live)configure# primitive mariadbdrbd ocf:linbit:drbd params drbd_resource=drbd op monitor role=Master interval=10s timeout=20s op monitor role=Slave interval=20s timeout=20s op start timeout=240s op stop timeout=120s
定于drbd的主从资源
crm(live)configure# ms ms_mariadbdrbd mariadbdrbd meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

4.定义文件系统资源和约束关系

crm(live)configure# primitive mariadbstore ocf:heartbeat:Filesystem params device="/dev/drbd0" directory="/mydata" fstype="ext4" op monitor interval=40s timeout=40s op start timeout=60s op stop timeout=60s
crm(live)configure# colocation mariadbstore_with_ms_mariadbdrbd inf: mariadbstore ms_mariadbdrbd:Master
crm(live)configure# order ms_mariadbrbd_before_mariadbstore mandatory: ms_mariadbdrbd:promote mariadbstore:start

5.增加vip和MariaDB的资源及约束关系

crm(live)configure# primitive madbvip ocf:heartbeat:IPaddr2 params ip="192.168.1.240" op monitor interval=20s timeout=20s on-fail=restart
crm(live)configure# primitive maserver lsb:mysqld op monitor interval=20s timeout=20s on-fail=restart
crm(live)configure# verify

定义约束关系

crm(live)configure# colocation maserver_with_mariadbstore inf: maserver mariadbstore
crm(live)configure# order mariadbstore_before_maserver mandatory: mariadbstore:start maserver:start
crm(live)configure# verify
crm(live)configure# colocation madbvip_with_maserver inf: madbvip maserver
crm(live)configure# order madbvip_before_masever mandatory: madbvip maserver
crm(live)configure# verify
crm(live)configure# commit

6.查看所有定义的资源

node node1.zero1.com
node node2.zero1.com
primitive madbvip ocf:heartbeat:IPaddr2 \
params ip="192.168.1.240" \
op monitor interval="20s" timeout="20s" on-fail="restart"
primitive mariadbdrbd ocf:linbit:drbd \
params drbd_resource="drbd" \
op monitor role="Master" interval="30s" timeout="20s" \
op monitor role="Slave" interval="60s" timeout="20s" \
op start timeout="240s" interval="0" \
op stop interval="0s" timeout="100s"
primitive mariadbstore ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/mydata" fstype="ext4" \
op monitor interval="40s" timeout="40s" \
op start timeout="60s" interval="0" \
op stop timeout="60s" interval="0"
primitive maserver lsb:mysqld \
op monitor interval="20s" timeout="20s" on-fail="restart"
ms ms_mariadbdrbd mariadbdrbd \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
colocation madbvip_with_maserver inf: madbvip maserver
colocation mariadbstore_with_ms_mariadbdrbd inf: mariadbstore ms_mariadbdrbd:Master
colocation maserver_with_mariadbstore inf: maserver mariadbstore
order madbvip_before_masever inf: madbvip maserver
order mariadbstore_before_maserver inf: mariadbstore:start maserver:start
order ms_mariadbrbd_before_mairadbstore inf: ms_mariadbdrbd:promote mariadbstore:start
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6_5.2-368c726" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"

九、测试

1.查看运行状态

[root@node1 ~]# crm status
Last updated: Wed Apr 23 16:24:11 2014
Last change: Wed Apr 23 16:21:50 2014 via cibadmin on node1.zero1.com
Stack: classic openais (with plugin)
Current DC: node1.zero1.com - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
5 Resources configured
Online: [ node1.zero1.com node2.zero1.com ]
Master/Slave Set: ms_mariadbdrbd [mariadbdrbd]
Masters: [ node1.zero1.com ]
Slaves: [ node2.zero1.com ]
mariadbstore (ocf::heartbeat:Filesystem): Started node1.zero1.com
madbvip (ocf::heartbeat:IPaddr2): Started node1.zero1.com
maserver (lsb:mysqld): Started node1.zero1.com

2.手动切换节点

[root@node1 ~]# crm node standby node1.zero1.com
[root@node1 ~]# crm status
Last updated: Wed Apr 23 16:26:05 2014
Last change: Wed Apr 23 16:25:34 2014 via crm_attribute on node1.zero1.com
Stack: classic openais (with plugin)
Current DC: node1.zero1.com - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
5 Resources configured
Node node1.zero1.com: standby
Online: [ node2.zero1.com ]
Master/Slave Set: ms_mariadbdrbd [mariadbdrbd]
Masters: [ node2.zero1.com ]
Stopped: [ node1.zero1.com ]
mariadbstore (ocf::heartbeat:Filesystem): Started node2.zero1.com
madbvip (ocf::heartbeat:IPaddr2): Started node2.zero1.com
maserver (lsb:mysqld): Started node2.zero1.com