使用BeautifulSoup模块解析HTML

2024-04-02 11:38•html•阅读 836

问题：

UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 10 of the file D:\python_work\test\test.py. To get rid of this warning, pass the additional argument 'features="html.parser"' to the BeautifulSoup constructor.

  noStarchSoup = bs4.BeautifulSoup(res.text)

解决方法：

    noStarchSoup = bs4.BeautifulSoup(res.text,features='html.parser')

《CSS选择器的例子》，select()方法将返回一个Tag对象的列表

传递给select()方法的选择器	将匹配...
soup.select('div')	所有名为<div>的元素
soup.select('#author')	带有id属性为author的元素
soup.select('.notice')	所有使用CSS class属性名为notice的元素
soup.select('div span')	所有在<div>元素之内的<span>元素
soup.select('div >span')	所有直接在<div>元素之内的<span>元素，中间没有其他元素
soup.select('input[name]')	所有名为<input>，并有一个name属性，其值无所谓的元素
soup.select('input[type="button"]')	所有名为<input>，并有一个type属性，其值为button的元素

上一篇 »爬虫1：html页面+beautifulsoap模块+get方式+demo
下一篇 »爬虫2：html页面+beautifulsoap模块+post方式+demo

使用BeautifulSoup模块解析HTML

相关推荐

爬虫3：html页面+webdriver模块+demo

解析CSS加密技术之“障眼法”

使用Html Agility Pack快速解析Html内容

python: 添加自定义模块路径 —— 可以使用相对路径

【Python】etree方法生成，解析xml

HTML5离线存储的工作原理和使用

JAVA使用Gson解析json数据，实例

深入解析CSS样式层叠权重值