Hive中对json处理
json
适用于现代 C++ 的 JSON。
项目地址:https://gitcode.com/gh_mirrors/js/json
免费下载资源
·
应用场景:使用Hive对日志信息进行查询分解,log里面记录的是json形式的数据:
{"logid":"5d40e1af-19f7-4aad-af8f-c7247e322e5c","souc":"4","devi":"OPPO R7sm","sys":"22,5.1.1","dname":"Dalvik/2.1.0 (Linux; U;
Android 5.1.1; OPPO R7sm Build/LMY47V)","chan":"canary","vers":"638.1","mac":"dc:6d:cd:16:46:0f","imei":"869410022554506","ifa":"","city_id":"384",
"reso":"1800*1080","euid":"a341c57d6abcdd969b6dc4c7a564a15f","dpi":"3.0","host":"192.168.1.212:80","ip":"223.100.138.62","uid":"9587355",
"qtype":"view_caipu_detail","qtype_sub":"","obj":"861920","uri":"recipe/detail","qnum":"1","refer":"","lat":"39.68861","lon":"122.978162",
"srctype":"2800","agentid":"2066b417879f467164fd1fb91b9d04c0",
"ext":"{\"query\":{\"kw\":\"培根披萨\",\"src\":\"2801\",\"idx\":\"3\",\"type\":\"13\",\"id\":\"861920\"}}","page_num":"","android_id":"","pseudo_id":""}
在这部分日志里面,对于:
"ext":"{\"query\":{\"kw\":\"培根披萨\",\"src\":\"2801\",\"idx\":\"3\",\"type\":\"13\",\"id\":\"861920\"}}","
部分,是json里面又包含了2重json。
接下来,我们使用Hive来取得kw、src、idx、type等信息:
写法一:
select tmp.kw,tmp.src,tmp.idx,count(*) as num
from (
select get_json_object(tt.query,'$.kw') as kw,get_json_object(tt.query,'$.src') as src,get_json_object(tt.query,'$.idx') as idx
from (
select get_json_object(t.ext,'$.query') as query
from (
select req["ext"] as ext
from dh_server_log where p_day=20170213 and req["qtype"]='view_caipu_detail' and req["ext"] <> ''
) t
) tt
) tmp group by tmp.kw,tmp.src,tmp.idx
这种写法使用了Hive内置函数get_json_object,而且使用了多层次来分解,可以实现对json的分解。
写法二:
select json_tuple(get_json_object(req["ext"],"$.query"),"kw","id","idx","type","src")
from dh_server_log where p_day=20170213 and req["qtype"]='view_caipu_detail' and req["ext"] <> ''
limit 20
这种写法使用了json_tuple函数配合get_json_object来取得json里的数据,取得了''query'':'' ''这第二层json的数据,但是如果对于"kw","id","idx","type","src"这个几个字段只需要其中几个的话,则需要使用lateral view来处理了:
select nt.a,count(nt.a) as num from (
select req from dh_server_log where p_day=20170213 and req["qtype"]='view_caipu_detail' and req["ext"] <> '' limit 10
) dsl lateral view json_tuple(get_json_object(req["ext"],"$.query"),"kw","id","idx","type","src") nt as a,b,c,d,e
GitHub 加速计划 / js / json
18
5
下载
适用于现代 C++ 的 JSON。
最近提交(Master分支:3 个月前 )
2d42229f
* Support BSON uint64 de/serialization
Signed-off-by: Michael Valladolid <mikevalladolid@gmail.com>
* Treat 0x11 as uint64 and not timestamp specific
Signed-off-by: Michael Valladolid <mikevalladolid@gmail.com>
---------
Signed-off-by: Michael Valladolid <mikevalladolid@gmail.com> 5 天前
1809b3d8
Signed-off-by: Niels Lohmann <mail@nlohmann.me> 5 天前
更多推荐
已为社区贡献5条内容
所有评论(0)