当前位置：首页 > 编程资源 > 编程问答 >内容正文

编程问答

Hadoop大数据——mapreduce的排序机制之total排序

发布时间：2025/1/21 编程问答 37 豆豆

生活随笔收集整理的这篇文章主要介绍了 Hadoop大数据——mapreduce的排序机制之total排序小编觉得挺不错的,现在分享给大家,帮大家做个参考.

mapreduce的排序机制之total排序

（1）设置一个reduce task ，全局有序，但是并发度太低，单节点负载太大
（2）设置分区段partitioner，设置相应数量的reduce task，可以实现全局有序，但难以避免数据分布不均匀——数据倾斜问题，有些reduce task负载过大，而有些则过小；
（3）可以通过编写一个job来统计数据分布规律，获取合适的区段划分，然后用分区段partitioner来实现排序；但是这样需要另外编写一个job对整个数据集运算，比较费事
（4）利用hadoop自带的取样器，来对数据集取样并划分区段，然后利用hadoop自带的TotalOrderPartitioner分区来实现全局排序

/*** 全排序示例* @author zhangxueliang**/ public class TotalSort {static class TotalSortMapper extends Mapper<Text, Text, Text, Text> {OrderBean bean = new OrderBean();@Overrideprotected void map(Text key, Text value, Context context) throws IOException, InterruptedException {// String line = value.toString();// String[] fields = line.split("\t");// bean.set(fields[0], Double.parseDouble(fields[1]));context.write(key, value);}}static class TotalSortReducer extends Reducer<Text, Text, Text, Text> {@Overrideprotected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {for (Text v : values) {context.write(key, v);}}}public static void main(String[] args) throws Exception {Configuration conf = new Configuration();Job job = Job.getInstance(conf);job.setJarByClass(TotalSort.class);job.setMapperClass(TotalSortMapper.class);job.setReducerClass(TotalSortReducer.class); // job.setOutputKeyClass(OrderBean.class); // job.setOutputValueClass(NullWritable.class);//用来读取sequence源文件的输入组件job.setInputFormatClass(SequenceFileInputFormat.class);FileInputFormat.setInputPaths(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));// job.setPartitionerClass(RangePartitioner.class);//分区的逻辑使用的hadoop自带的全局排序分区组件job.setPartitionerClass(TotalOrderPartitioner.class);//系统自带的这个抽样器只能针对sequencefile抽样RandomSampler randomSampler= new InputSampler.RandomSampler<Text,Text>(0.1,100,10);InputSampler.writePartitionFile(job, randomSampler);//获取抽样器所产生的分区规划描述文件Configuration conf2 = job.getConfiguration();String partitionFile = TotalOrderPartitioner.getPartitionFile(conf2);//把分区描述规划文件分发到每一个task节点的本地job.addCacheFile(new URI(partitionFile));//设置若干并发的reduce taskjob.setNumReduceTasks(3);job.waitForCompletion(true);} }

总结

以上是生活随笔为你收集整理的Hadoop大数据——mapreduce的排序机制之total排序的全部内容，希望文章能够帮你解决所遇到的问题。

如果觉得生活随笔网站内容还不错，欢迎将生活随笔推荐给好友。

上一篇： Hadoop大数据——mapreduce
下一篇： Hadoop大数据——mapreduce