首先我们查看一下request库的返回值类型,这样就知道BeautifulSoup构造方法需要什么类型的参数了:

request返回值类型: <class 'str'>

  我们发现,request库的返回值类型是String,也就是说,我们可以先把bs4.element.ResultSet类型转换为String,之后再用BeautifulSoup构造方法将String类型转换为BeautifulSoup,这样就可以继续用find_All()方法,代码如下:

		data = getHtmlText(url=url)  # 这里返回值其实是request.text
    print('request返回值类型:',type(data))

    soup = BeautifulSoup(data, "html.parser")
    print('BeautifulSoup类型:',type(soup))
    page = soup.find_all('div',class_='more-page')
    data2 = str(page)

    soup2 = BeautifulSoup(data2, "html.parser")
    page_count = soup2.script.string
    # print(page_count)

  getHtmlText方法代码如下:

def getHtmlText(url):
    headers = {
        'Accept': '*/*',
        'Accept-Encoding': 'gzip, deflate',
        'Accept-Language': 'zh-CN,zh;q=0.9',
        'Connection': 'keep-alive',
        'Cookie': 'widget_dz_id=54511; widget_dz_cityValues=,; timeerror=1; defaultCityID=54511; defaultCityName=%u5317%u4EAC; Hm_lvt_a3f2879f6b3620a363bec646b7a8bcdd=1516245199; Hm_lpvt_a3f2879f6b3620a363bec646b7a8bcdd=1516245199; addFavorite=clicked',
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3236.0 Safari/537.36'
    }
    try:
        r = requests.get(url, timeout=30, headers=headers)
        r.raise_for_status() #如果状态不是200,引发HTTPError异常(200表示能正常访问url)
        r.encoding = r.apparent_encoding
        return r.text  # 获取数据
    except:
        return "产生异常"
GitHub 加速计划 / eleme / element
54.06 K
14.63 K
下载
A Vue.js 2.0 UI Toolkit for Web
最近提交(Master分支:3 个月前 )
c345bb45 7 个月前
a07f3a59 * Update transition.md * Update table.md * Update transition.md * Update table.md * Update transition.md * Update table.md * Update table.md * Update transition.md * Update popover.md 7 个月前
Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐