前言
HDFS为管理员提供了针对目录的配额控制特性,可以控制名称配额(指定目录下的文件&文件夹总数),或者空间配额(占用磁盘空间的上限)。
本文探究了HDFS的配额控制特性,记录了各类配额控制场景的实验详细过程。
实验环境基于Apache Hadoop 2.5.0-cdh5.2.0。
欢迎转载,请注明出处:http://blog.csdn.net/u010967382/article/details/44452485
名称配额功能试用
设置名称配额,即当前目录下文件和目录的最大数量:
casliyang@singlehadoop:~$ hdfs dfsadmin -setQuota 3 /Workspace/quotas/
15/03/18 14:53:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
上传文件:
casliyang@singlehadoop:~$ hdfs dfs -put slf4j-log4j12-1.6.4.jar /Workspace/quotas/
15/03/18 14:54:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
上传文件:
casliyang@singlehadoop:~$ hdfs dfs -put dict.txt /Workspace/quotas/
15/03/18 14:55:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
继续上传文件报错:
casliyang@singlehadoop:~$ hdfs dfs -put examples.desktop /Workspace/quotas/
15/03/18 14:55:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
put: The NameSpace quota (directories and files) of directory /Workspace/quotas is exceeded: quota=3 file count=4
查看该目录情况:
casliyang@singlehadoop:~$ hdfs dfs -ls /Workspace/quotas
15/03/18 17:11:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 3 casliyang supergroup 14 2015-03-18 14:55 /Workspace/quotas/dict.txt
-rw-r--r-- 3 casliyang supergroup 9748 2015-03-18 14:38 /Workspace/quotas/slf4j-log4j12-1.6.4.jar
目录下只有两个文件。
查看配额情况:
casliyang@singlehadoop:~$ hdfs dfs -count -q /Workspace/quotas
15/03/18 16:00:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3 0 none inf 1 2 9762 /Workspace/quotas
注意几个数字的含义依次是:
- 名称配额总量(none代表没设定)
- 名称配额剩余量(inf代表没设定)
- 空间配额总量(none代表没设定)
- 空间配额剩余量(inf代表没设定)
- 目录数
- 文件数
- 内容占用空间
- 目标地址
名称配额剩余量的计算公式:
名称配额剩余量 = 名称配额总量 - ( 目录数 + 文件数 )
基于上面配额查看结果得知:
名称配额总量=3
目录数=1
文件数=2
所以,名称配额剩余量=3-(1+2)=0
所以此时继续上传文件会超出名称配额的限制。
BTW,居然把根目录也计数了!
我们再来建一个没有设定quotas的目录,测试下是否也将根目录纳入count统计:
casliyang@singlehadoop:~$ hdfs dfs -mkdir /Workspace/quotas1
15/03/18 17:19:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
casliyang@singlehadoop:~$ hdfs dfs -count -q /Workspace/quotas1
15/03/18 17:20:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
none inf none inf 1 0 0 /Workspace/quotas1
上面的实验看出,名称配额和空间配额都没设置,所以都显示none/inf,果然当前根目录还是计入统计了,目录数为1。
下面通过命令取消/Workspace/quotas目录的名称配额:
casliyang@singlehadoop:~$ hdfs dfs -count -q /Workspace/quotas
15/03/18 17:28:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3 0 none inf 1 2 9762 /Workspace/quotas
casliyang@singlehadoop:~$ hdfs dfsadmin -clrQuota /Workspace/quotas
15/03/18 17:28:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
casliyang@singlehadoop:~$ hdfs dfs -count -q /Workspace/quotas
15/03/18 17:28:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
none inf none inf 1 2 9762 /Workspace/quotas
空间配额功能试用
清空目录/Workspace/quotas:
casliyang@singlehadoop:~$ hdfs dfs -rm -r /Workspace/quotas/*
15/03/18 17:33:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/03/18 17:33:11 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /Workspace/quotas/dict.txt
15/03/18 17:33:11 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /Workspace/quotas/slf4j-log4j12-1.6.4.jar
casliyang@singlehadoop:~$ hdfs dfs -ls /Workspace/quotas/
15/03/18 17:33:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
casliyang@singlehadoop:~$ hdfs dfs -count -q /Workspace/quotas
15/03/18 17:33:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
none inf none inf 1 0 0 /Workspace/quotas
为目录设定空间限额:
casliyang@singlehadoop:~$ hdfs dfsadmin -setSpaceQuota 8000 /Workspace/quotas
15/03/18 17:36:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
casliyang@singlehadoop:~$ hdfs dfs -count -q /Workspace/quotas
15/03/18 17:36:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
none inf 8000 8000 1 0 0 /Workspace/quotas
为/Workspace/quotas目录设定空间配额为8000字节,可以看到,空间配额总量8000,空间配额剩余量8000,已用空间配额0。
下面尝试上传一个大小超过8000字节的文件slf4j-log4j12-1.6.4.jar,该文件大小是9748字节:
casliyang@singlehadoop:~$ ll slf4j-log4j12-1.6.4.jar
-rw-r--r-- 1 casliyang casliyang 9748 Mar 6 14:50 slf4j-log4j12-1.6.4.jar
casliyang@singlehadoop:~$ hdfs dfs -put slf4j-log4j12-1.6.4.jar /Workspace/quotas
15/03/18 17:40:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/03/18 17:40:36 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota of /Workspace/quotas is exceeded: quota
= 8000 B = 7.81 KB but diskspace consumed = 402653184 B = 384 MB
at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyDiskspaceQuota(DirectoryWithQuotaFeature.java:144)
at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuota(DirectoryWithQuotaFeature.java:154)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:1815)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1650)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1625)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:373)
会报错是预料之中,但是最后一句话磁盘空间消耗了384MB,这有点出乎预料,经查阅资料得知:
我们需要设置的空间配额不是文件本身的大小,而是block占用的磁盘的最大值,比如,上面实验中的文件大小是9748字节,即9.5k左右,hdfs-site.xml中配置的block大小是128MB,副本数是3,文件可以存储在1个block内,所以需占用的磁盘总量最大值(即block的size)是128*3=384MB!
我们修改目录的配额,将其设置为384MB:
casliyang@singlehadoop:~$ hdfs dfsadmin -setSpaceQuota 384m /Workspace/quotas
再次上传文件:
casliyang@singlehadoop:~$ hdfs dfs -put slf4j-log4j12-1.6.4.jar /Workspace/quotas
15/03/19 09:02:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
casliyang@singlehadoop:~$ hdfs dfs -count -q /Workspace/quotas
15/03/19 09:03:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
none inf 402653184 402623940 1 1 9748 /Workspace/quotas
可见,上传成功,空间配额是402653184,剩余配额是402623940,内容占用空间9748。
这三个数字的计算关系是 402653184 - 9748 *3 = 402623940
!
即使剩余空间显示还剩402623940,但是已经无法继续上传任何东西了,因为用于存9748文件的3个128MB的block已经撑满了磁盘空间配额!
******吐槽一句,这么个显示模式真心不友好,不明白规则的用户肯定看不懂!****
不死心,再尝试是否还能上传文件,用作实验的是一个很小的文件,只有14bytes:
casliyang@singlehadoop:~$ hdfs dfs -put dict.txt /Workspace/quotas
15/03/19 09:04:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/03/19 09:04:33 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota of /Workspace/quotas is exceeded: quota = 402653184 B = 384 MB but diskspace consumed = 402682428 B = 384.03
MB
无法上传,的确是配额满了。
要点总结
- HDFS可以为指定目录设置名称配额Name Quotas和空间配额Space Quotas。
- Name Quotas控制指定根目录下的所有目录和文件数量(具体计算规则见上文),Space Quotas控制指定根目录下的所有文件占用空间(具体计算规则见上文)。
- 根据官网说明,名称配额和空间配额的最大值是Long.Max_Value。