当前位置：首页 > 编程资源 > 编程问答 >内容正文

编程问答

Hadoop 统计单词字数的例子

发布时间：2025/3/20 编程问答 39 豆豆

生活随笔收集整理的这篇文章主要介绍了 Hadoop 统计单词字数的例子小编觉得挺不错的,现在分享给大家,帮大家做个参考.

hadoop 的核心还是 Map-Reduce过程和 hadoop分布式文件系统

第一步：定义Map过程

/**

* Description:

* @author charles.wang

* @created Mar 12, 2012 1:41:57 PM

public class MyMap extends Mapper<Object, Text, Text, IntWritable> {

private static final IntWritable one = new IntWritable(1);

private Text word;

public void map(Object key ,Text value,Context context)

throws IOException,InterruptedException{

String line=value.toString();

StringTokenizer tokenizer = new StringTokenizer(line);

while(tokenizer.hasMoreTokens()){

word = new Text();

word.set(tokenizer.nextToken());

context.write(word, one);

}

第二步：定义 Reduce 过程

/**

* Description:

* @author charles.wang

* @created Mar 12, 2012 1:48:18 PM

public class MyReduce extends Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce (Text key,Iterable<IntWritable> values,Context context)

throws IOException ,InterruptedException{

int sum=0;

for(IntWritable val: values){

sum+=val.get();

}

context.write(key, new IntWritable(sum));

}

编写一个Driver 来执行Map-Reduce过程

public class MyDriver {

public static void main(String [] args) throws Exception{

Configuration conf = new Configuration();

conf.set("hadoop.job.ugi", "root,root123");

Job job = new Job(conf,"Hello,hadoop! ^_^");

job.setJarByClass(MyDriver.class);

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(IntWritable.class);

job.setMapperClass(MyMap.class);

job.setCombinerClass(MyReduce.class);

job.setReducerClass(MyReduce.class);

job.setInputFormatClass(TextInputFormat.class);

job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.setInputPaths(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job,new Path(args[1]));

job.waitForCompletion(true);

}

转载于:https://blog.51cto.com/supercharles888/840723

总结

以上是生活随笔为你收集整理的Hadoop 统计单词字数的例子的全部内容，希望文章能够帮你解决所遇到的问题。

如果觉得生活随笔网站内容还不错，欢迎将生活随笔推荐给好友。

上一篇：详解网页中的关键词分布技术
下一篇：基于哈希算法的web账户口令存储方法