当前位置：首页 > 编程资源 > 编程问答 >内容正文

编程问答

8.2 英文词频统计(project)

发布时间：2024/5/15 编程问答 51 豆豆

生活随笔收集整理的这篇文章主要介绍了 8.2 英文词频统计(project) 小编觉得挺不错的,现在分享给大家,帮大家做个参考.

第1关读取文件

第2关统计单词数量

第3关统计单词出现的次数

第4关统计非特殊单词出现的次数

第1关读取文件

本关任务：编写一个读取文件的小程序。

问题描述

《谁动了我的奶酪？》是美国作家斯宾塞·约翰逊创作的一个寓言故事，该书首次出版于1998年。书中主要讲述4个“人物”——两只小老鼠“嗅嗅(Sniff)”、“匆匆(Scurry)”和两个小矮人“哼哼(Hem)”、“唧唧(Haw)”找寻奶酪的故事。‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬

import stringdef read_file(file):"""接收文件名为参数，将文件中的内容读为字符串，只保留文件中的英文字母和西文符号，过滤掉中文所有字符转为小写，将其中所有标点、符号替换为空格，返回字符串"""########## Begin ##########with open (file) as f :txt = f.read().lower()for i in ',."-':txt = txt.replace(i,' ')return txt########## End ##########if __name__ == '__main__':filename = 'Who Moved My Cheese.txt' # 文件名content = read_file(filename) # 调用函数返回字典类型的数据n = int(input())print(content[:n])

第2关统计单词数量

本关任务：编写一个能计算单词数量的小程序。

import stringdef count_of_words(txt):"""接收去除标点、符号的字符串，统计并返回其中单词数量和不重复的单词数量"""########## Begin ##########txt = txt.split()counts = {}for i in txt:counts[i] = counts.get(i,0) + 1return len(txt),len(counts)########## End ##########def read_file(file):"""接收文件名为参数，将文件中的内容读为字符串，只保留文件中的英文字母和西文符号，过滤掉中文所有字符转为小写，将其中所有标点、符号替换为空格，返回字符串"""with open(file, 'r', encoding='utf-8') as novel:txt = novel.read()english_only_txt = ''.join(x for x in txt if ord(x) < 256)english_only_txt = english_only_txt.lower()for character in string.punctuation:english_only_txt = english_only_txt.replace(character, ' ')return english_only_txtif __name__ == '__main__':filename = 'Who Moved My Cheese.txt' # 文件名content = read_file(filename) # 调用函数返回字典类型的数据amount_results = count_of_words(content)print('文章共有单词{}个，其中不重复单词{}个'.format(*amount_results))

第3关统计单词出现的次数

预期输出：

the 369

he 337

to 333

and 312

cheese 214

it 187

they 166

of 158

a 146

had 142

import stringdef word_frequency(txt):"""接收去除标点、符号的字符串，统计并返回每个单词出现的次数返回值为字典类型，单词为键，对应出现的次数为值"""########## Begin ##########txt = txt.split()counts = {}for i in txt:counts[i] = counts.get(i,0) + 1return counts########## End ##########def top_ten_words(frequency, cnt):"""接收词频字典，输出出现次数最多的cnt个单词及其出现次数"""########## Begin ##########dic = sorted(frequency.items(),key = lambda x: x[1], reverse = True)for i in dic[0:cnt]:print(*i)########## End ##########def read_file(file):"""接收文件名为参数，将文件中的内容读为字符串，只保留文件中的英文字母和西文符号，过滤掉中文所有字符转为小写，将其中所有标点、符号替换为空格，返回字符串"""with open(file, 'r', encoding='utf-8') as novel:txt = novel.read()english_only_txt = ''.join(x for x in txt if ord(x) < 256)english_only_txt = english_only_txt.lower()for character in string.punctuation:english_only_txt = english_only_txt.replace(character, ' ')return english_only_txtif __name__ == '__main__':filename = 'Who Moved My Cheese.txt' # 文件名content = read_file(filename) # 调用函数返回字典类型的数据frequency_result = word_frequency(content) # 统计词频n = int(input())top_ten_words(frequency_result, n)

第4关统计非特殊单词出现的次数

测试输入： 8

预期输出：

cheese 214

haw 113

what 105

change 86

hem 83

new 70

said 60

maze 46

import stringdef top_ten_words_no_excludes(frequency, cnt):"""接收词频字典，去除常见的冠词、代词、系动词和连接词后，输出出现次数最多的cnt个单词及其出现次数需排除的单词如下：excludes_words = ['a', 'an', 'the', 'i', 'he', 'she', 'his', 'my', 'we','or', 'is', 'was', 'do','and', 'at', 'to', 'of', 'it', 'on', 'that', 'her', 'c','in', 'you', 'had','s', 'with', 'for', 't', 'but', 'as', 'not', 'they', 'be', 'were', 'so', 'our','all', 'would', 'if', 'him', 'from', 'no', 'me', 'could', 'when', 'there','them', 'about', 'this', 'their', 'up', 'been', 'by', 'out', 'did', 'have']"""########## Begin ##########excludes_words = ['a', 'an', 'the', 'i', 'he', 'she', 'his', 'my', 'we','or', 'is', 'was', 'do','and', 'at', 'to', 'of', 'it', 'on', 'that', 'her', 'c','in', 'you', 'had','s', 'with', 'for', 't', 'but', 'as', 'not', 'they', 'be', 'were', 'so', 'our','all', 'would', 'if', 'him', 'from', 'no', 'me', 'could', 'when', 'there','them', 'about', 'this', 'their', 'up', 'been', 'by', 'out', 'did', 'have']for i in excludes_words:frequency.pop(i)dic = sorted(frequency.items(),key = lambda x: x[1], reverse = True)for i in dic[0:cnt]:print(*i)########## End ##########def read_file(file):"""接收文件名为参数，将文件中的内容读为字符串，只保留文件中的英文字母和西文符号，过滤掉中文所有字符转为小写，将其中所有标点、符号替换为空格，返回字符串"""with open(file, 'r', encoding='utf-8') as novel:txt = novel.read()english_only_txt = ''.join(x for x in txt if ord(x) < 256)english_only_txt = english_only_txt.lower()for character in string.punctuation:english_only_txt = english_only_txt.replace(character, ' ')return english_only_txtdef word_frequency(txt):"""接收去除标点、符号的字符串，统计并返回每个单词出现的次数返回值为字典类型，单词为键，对应出现的次数为值"""frequency = dict()words_list = txt.split()for word in words_list:frequency[word] = frequency.get(word, 0) + 1return frequencyif __name__ == '__main__':filename = 'Who Moved My Cheese.txt' # 文件名content = read_file(filename) # 调用函数返回字典类型的数据frequency_result = word_frequency(content) # 统计词频n = int(input())top_ten_words_no_excludes(frequency_result, n)

总结

以上是生活随笔为你收集整理的8.2 英文词频统计(project)的全部内容，希望文章能够帮你解决所遇到的问题。

如果觉得生活随笔网站内容还不错，欢迎将生活随笔推荐给好友。

上一篇：微信小程序 swiper和weiper-
下一篇：汽车ECU AUTOSAR 开发

编程问答

8.2 英文词频统计(project)

第1关 读取文件

第2关 统计单词数量

第3关 统计单词出现的次数

第4关 统计非特殊单词出现的次数

总结

第1关读取文件

第2关统计单词数量

第3关统计单词出现的次数

第4关统计非特殊单词出现的次数