python爬虫lxml xpath测试
生活随笔
收集整理的这篇文章主要介绍了
python爬虫lxml xpath测试
小编觉得挺不错的,现在分享给大家,帮大家做个参考.
xpath测试1:
main.py
xpath测试2:
test.html
main.py
"""=== coding: UTF8 ===""" from lxml import etree""" ======================================== 主函数功能测试 ======================================== """ if __name__ == '__main__':parser = etree.HTMLParser(encoding='utf-8')tree = etree.parse("test.html", parser=parser)# result = tree.xpath("/html") # /表示层级关系,第一个/是根节点# result = tree.xpath("/html/body/ul/li/a/text()") # text()拿文本# result = tree.xpath("/html/body/ul/li[1]/a/text()") # xpath的顺序是从1开始数的,[]表示索引# result = tree.xpath("/html/body/ol/li/a[@href='dapao']/text()") # @xxx=xxx表示属性的筛选# print(result)ol_li_list = tree.xpath("/html/body/ol/li")for li in ol_li_list:# 从每一个li中提取到文字信息result = li.xpath("./a/text()") # 在li中继续查找,相对查找print(result)result = li.xpath("./a/@href") # 拿到属性值: @属性print(result)print(tree.xpath("/html/body/ul/li/a/@href"))print(tree.xpath("/html/body/div[1]/text()"))print(tree.xpath("/html/body/ol/li/a/text()"))关注公众号,获取更多资料
总结
以上是生活随笔为你收集整理的python爬虫lxml xpath测试的全部内容,希望文章能够帮你解决所遇到的问题。
- 上一篇: python脚本去除文件名里的空格
- 下一篇: CTF【解密】字符串flag被加密成已知