生活随笔
收集整理的这篇文章主要介绍了
Hadoop 统计单词字数的例子
小编觉得挺不错的,现在分享给大家,帮大家做个参考.
hadoop 的核心还是 Map-Reduce过程和 hadoop分布式文件系统
第一步:定义Map过程
public class MyMap extends Mapper<Object, Text, Text, IntWritable> { private static final IntWritable one = new IntWritable(1); private Text word; public void map(Object key ,Text value,Context context) throws IOException,InterruptedException{ String line=value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while(tokenizer.hasMoreTokens()){ word = new Text(); word.set(tokenizer.nextToken()); context.write(word, one); } } } 第二步: 定义 Reduce 过程
public class MyReduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce (Text key,Iterable<IntWritable> values,Context context) throws IOException ,InterruptedException{ int sum=0; for(IntWritable val: values){ sum+=val.get(); } context.write(key, new IntWritable(sum)); } }
编写一个Driver 来执行Map-Reduce过程
public class MyDriver { public static void main(String [] args) throws Exception{ Configuration conf = new Configuration(); conf.set("hadoop.job.ugi", "root,root123"); Job job = new Job(conf,"Hello,hadoop! ^_^"); job.setJarByClass(MyDriver.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setMapperClass(MyMap.class); job.setCombinerClass(MyReduce.class); job.setReducerClass(MyReduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job,new Path(args[1])); job.waitForCompletion(true); } }
转载于:https://blog.51cto.com/supercharles888/840723
总结
以上是生活随笔为你收集整理的Hadoop 统计单词字数的例子的全部内容,希望文章能够帮你解决所遇到的问题。
如果觉得生活随笔网站内容还不错,欢迎将生活随笔推荐给好友。