python爬虫 如何解析json文件 json文件的解析提取和jsonpath的应用
json
适用于现代 C++ 的 JSON。
项目地址:https://gitcode.com/gh_mirrors/js/json
免费下载资源
·
这是通过抓包工具抓取到的json文件
然后json文件在线解析,把内容复制粘贴进去解析得出下面的内容(右边框内)
json文件的地址url="http://www.lagou.com/lbs/getAllCitySearchLabels.json"
用python来解析 并提取出其中的城市名
代码如下:
#coding:utf8
import urllib2
#json解析库,对应到lxml
import json
#json的解析语法,对应到xpath
import jsonpath
url="http://www.lagou.com/lbs/getAllCitySearchLabels.json"
header={"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0"}
request=urllib2.Request(url,headers=header)
response=urllib2.urlopen(request)
#取出json文件里的内容,返回的格式是字符串
html=response.read()
#把json形式的字符串转换成python形式的Unicode字符串
unicodestr=json.loads(html)
#python形式的列表
city_list=jsonpath.jsonpath(unicodestr,"$..name")
#打印每个城市
for i in city_list:
print i
#dumps()默认中文伟ascii编码格式,ensure_ascii默认为Ture
#禁用ascii编码格式,返回Unicode字符串
array=json.dumps(city_list,ensure_ascii=False)
#把结果写入到lagouCity.json文件中
with open("lagouCity.json","w") as f:
f.write(array.encode("utf-8"))
打印结果如下图:
。
。
。
————————————————————《分割线》——————————————————
另外再写个简单的流程案例:
import requests
import json
import jsonpath
url='http://baijiajiekuan.oss-cn-shanghai.aliyuncs.com/mongo/risk/original/data/20180206/04b94dac3ed84922b6d53c85514e700c.txt'
response=requests.get(url)
# 输出编码格式
# print(response.apparent_encoding)
# 解码
response.encoding='utf8'
# 读取reponse
html=response.text
# print(html)
# 把json格式字符串转换成python对象
html=json.loads(html)
# print(html)
# 获取score节点下的数据
qq=jsonpath.jsonpath(html,'$..score')
print(qq)
JsonPath与XPath语法对比:
Json结构清晰,可读性高,复杂度低,非常容易匹配,下表中对应了XPath的用法。
XPath | JSONPath | 描述 |
---|---|---|
/ | $ | 根节点 |
. | @ | 现行节点 |
/ | . or[] | 取子节点 |
.. | n/a | 取父节点,Jsonpath未支持 |
// | .. | 就是不管位置,选择所有符合条件的条件 |
* | * | 匹配所有元素节点 |
@ | n/a | 根据属性访问,Json不支持,因为Json是个Key-value递归结构,不需要。 |
[] | [] | 迭代器标示(可以在里边做简单的迭代操作,如数组下标,根据内容选值等) |
| | [,] | 支持迭代器中做多选。 |
[] | ?() | 支持过滤操作. |
n/a | () | 支持表达式计算 |
() | n/a | 分组,JsonPath不支持 |
GitHub 加速计划 / js / json
18
5
下载
适用于现代 C++ 的 JSON。
最近提交(Master分支:3 个月前 )
f06604fc
* :page_facing_up: bump the copyright years
Signed-off-by: Niels Lohmann <mail@nlohmann.me>
* :page_facing_up: bump the copyright years
Signed-off-by: Niels Lohmann <mail@nlohmann.me>
* :page_facing_up: bump the copyright years
Signed-off-by: Niels Lohmann <niels.lohmann@gmail.com>
---------
Signed-off-by: Niels Lohmann <mail@nlohmann.me>
Signed-off-by: Niels Lohmann <niels.lohmann@gmail.com> 2 天前
d23291ba
* add a ci step for Json_Diagnostic_Positions
Signed-off-by: Harinath Nampally <harinath922@gmail.com>
* Update ci.cmake to address review comments
Signed-off-by: Harinath Nampally <harinath922@gmail.com>
* address review comment
Signed-off-by: Harinath Nampally <harinath922@gmail.com>
* fix typo in the comment
Signed-off-by: Harinath Nampally <harinath922@gmail.com>
* fix typos in ci.cmake
Signed-off-by: Harinath Nampally <harinath922@gmail.com>
* invoke the new ci step from ubuntu.yml
Signed-off-by: Harinath Nampally <harinath922@gmail.com>
* issue4561 - use diagnostic positions for exceptions
Signed-off-by: Harinath Nampally <harinath922@gmail.com>
* fix ci_test_documentation check
Signed-off-by: Harinath Nampally <harinath922@gmail.com>
* address review comments
Signed-off-by: Harinath Nampally <harinath922@gmail.com>
* fix ci check failures for unit-diagnostic-postions.cpp
Signed-off-by: Harinath Nampally <harinath922@gmail.com>
* improvements based on review comments
Signed-off-by: Harinath Nampally <harinath922@gmail.com>
* fix const correctness string
Signed-off-by: Harinath Nampally <harinath922@gmail.com>
* further refinements based on reviews
Signed-off-by: Harinath Nampally <harinath922@gmail.com>
* add one more test case for full coverage
Signed-off-by: Harinath Nampally <harinath922@gmail.com>
* ci check fix - add const
Signed-off-by: Harinath Nampally <harinath922@gmail.com>
* add unit tests for json_diagnostic_postions only
Signed-off-by: Harinath Nampally <harinath922@gmail.com>
* fix ci_test_diagnostics
Signed-off-by: Harinath Nampally <harinath922@gmail.com>
* fix ci_test_build_documentation check
Signed-off-by: Harinath Nampally <harinath922@gmail.com>
---------
Signed-off-by: Harinath Nampally <harinath922@gmail.com> 3 天前
更多推荐
已为社区贡献7条内容
所有评论(0)