MapReduce: number of mappers
14 down vote
|
It‘s the other way round. Number of mappers is decided based on the number of splits. In reality it is the job of To better understand this, assume you are processing data stored in your MySQL using MR. Since there is no concept of blocks in this case, the theory that splits are always created based on the HDFS block fails. Right? What about splits creation then? One possibility is to create splits based on ranges of rows in your MySQL table (and this is what It is only for the InputFormats based on There is a fundamental difference between MR Coming back to your question. Hadoop allows much more than 200 mappers. Having said that, it doesn‘t make much sense to have 200 mappers for just 500MB of data. Always remember that when you talk about Hadoop, you are dealing with very huge data. Sending just 2.5 MB data to each mapper would be an overkill. And yes, if there are no free CPU slots then some mappers may run after the completion of current mappers. But the MR framework is very intelligent and tries its best to avoid these kind of situation. If the machine where data to processed is present, doesn‘t have any free CPU slots, the data will be moved to a nearby node, where free slots are available, and get processed. HTH |
郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。