欢迎访问 如意编程网!

如意编程网

当前位置: 首页 > 前端技术 > HTML >内容正文

HTML

python爬虫之BeautifulSoup的HTML解析

发布时间:2022/11/16 HTML 27 老码农
如意编程网 收集整理的这篇文章主要介绍了 python爬虫之BeautifulSoup的HTML解析 小编觉得挺不错的,现在分享给大家,帮大家做个参考.

  BeautifulSoup是一个用于从HTML和XML文件中提取数据的python库,它提供一些简单的函数来处理导航、搜索、修改分析树等功能。BeautifulSoup能自动将文档转换成Unicode编码,输出文档转换为UTF-8编码。

  本例直接创建模拟HTML代码,进行美化:

# 导入BeautifulSoup库
from bs4 import BeautifulSoup

# 创建模拟HTML代码的字符串
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p><b>The Dormouse's story</b></p>

<p>Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" >Elsie</a>,
<a href="http://example.com/lacie" >Lacie</a> and
<a href="http://example.com/tillie" >Tillie</a>;
and they lived at the bottom of a well.</p>

<p>...</p>
"""
# 创建一个BeautifulSoup对象,获取页面正文
soup = BeautifulSoup(html_doc, features="lxml")
# 打印解析的HTML代码
print('经BeautifulSoup美化的代码:',soup)
print('===================================')
print('通过prettify()方法进行代码的格式化处理:',soup.prettify())

结果:

经BeautifulSoup美化的代码: <html><head><title>The Dormouse's story</title></head>
<body>
<p><b>The Dormouse's story</b></p>
<p>Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" id="link1">Elsie</a>,
<a href="http://example.com/lacie" id="link2">Lacie</a> and
<a href="http://example.com/tillie" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p>...</p>
</body></html>
===================================
通过prettify()方法进行代码的格式化处理: <html>
 <head>
  <title>
   The Dormouse's story
  </title>
 </head>
 <body>
  <p>
   <b>
    The Dormouse's story
   </b>
  </p>
  <p>
   Once upon a time there were three little sisters; and their names were
   <a href="http://example.com/elsie" id="link1">
    Elsie
   </a>
   ,
   <a href="http://example.com/lacie" id="link2">
    Lacie
   </a>
   and
   <a href="http://example.com/tillie" id="link3">
    Tillie
   </a>
   ;
and they lived at the bottom of a well.
  </p>
  <p>
   ...
  </p>
 </body>
</html>

python爬虫之BeautifulSoup的HTML解析

 

总结

以上是如意编程网为你收集整理的python爬虫之BeautifulSoup的HTML解析的全部内容,希望文章能够帮你解决所遇到的问题。

如果觉得如意编程网网站内容还不错,欢迎将如意编程网推荐给好友。