mysql海量数据条件删除

浏览数：36 / 时间：2015年06月12日

1. 问题描述：现在存在两个表，具体表结构及记录数如下所示：

mysql> desc user_mapping;
+------------+------------------+------+-----+---------+-------+
| Field      | Type             | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+-------+
| open_id    | varchar(64)      | NO   | PRI | NULL    |       |
| platform   | tinyint(4)       | NO   | PRI | NULL    |       |
| serverid   | int(10) unsigned | NO   | PRI | 0       |       |
| uid        | int(10) unsigned | NO   |     | NULL    |       |
| updatetime | int(11)          | YES  |     | NULL    |       |
| lastlogin  | int(11)          | YES  |     | NULL    |       |
| via        | varchar(128)     | YES  |     | NULL    |       |
+------------+------------------+------+-----+---------+-------+
7 rows in set (0.00 sec)
mysql> select count(*) from user_mapping;
+----------+
| count(*) |
+----------+
| 12579610 |
+----------+
1 row in set (2.49 sec)
mysql> desc uid1202;
+-----------------+----------------------+------+-----+---------+-------+
| Field           | Type                 | Null | Key | Default | Extra |
+-----------------+----------------------+------+-----+---------+-------+
| uid             | int(10) unsigned     | NO   |     | NULL    |       |
| last_login_time | int(10) unsigned     | YES  |     | NULL    |       |
| accCharge       | int(10) unsigned     | YES  |     | 0       |       |
| level           | smallint(5) unsigned | YES  |     | NULL    |       |
+-----------------+----------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

mysql> select count(*) from uid1202;
+----------+
| count(*) |
+----------+
|  9887299 |
+----------+
1 row in set (3.25 sec)

目的是将user_mapping表中的uid在uid1202表中存在的记录删除，mysql处理大数据时，多表连接会直接将服务器单核卡死，而且还不知道猴年马月才能处理完。

解决方案：

step1. 将user_mapping表和uid1202表dump到内网，建新库导入

step2. 去掉内网user_mapping表中的主键

alter table user_mapping drop primary key;

step3. 删除内网user_mapping表中uid重复的值删除保留一条

mysql -uusername -ppassword -e"select platform,uid from user_mapping group by uid having count(*) > 1 ;" > uid_double.txt
awk ‘{print "delete from user_mapping where platform="$1" and uid="$2";"}‘ uid_double.txt  > del_double.sql
mysql -uusername -ppassword user_del < del_double.sql

step4. 修改user_mapping表，重新建立以uid为主键

alter table user_mapping add primary key(uid);

step4. 构造查询user_mapping表中uid在uid1202表中的语句

mysql -uusername -ppassword user_del -e"select uid from uid1202" > uid.txt
awk ‘{print "select open_id,platform,serverid from user_mapping where uid="$1"; "}‘ uid.txt > del_uid.sql

step5. 构造删除user_mapping表中以open_id,platform,serverid为条件的语句

mysql -uroot -p1234 user_del < del_uid.sql > del_usermapping.txt
sed -i  ‘/open_id/d‘ del_usermapping.txt #删除奇数行table头
awk ‘{print "delete from user_mapping where open_id=\""$1"\" and platform="$2" and uid="$3"  ;"}‘ del_usermapping.txt > del_usermapping.txt

step6. 分解查询语句到多个sql文件，在外网同时运行

#!/bin/bash
for i in $(seq 1 8)
do
    cat del_usermapping.sql | head -n 1300000 > del_usermapping_$i.sql
    sed -i ‘1,1300000d‘ del_usermapping.sql
    cat del_usermapping_$i.sql | wc -l 
done

for i in $(seq 1 8)
do
    mysql -uroot -p1234 user_del < del_usermapping_$i.sql &
done