大数据之hive：行列转换系列总结

浊酒南街

5918人浏览 · 2021-08-22 19:51:44

浊酒南街 · 2021-08-22 19:51:44 发布

1、行转列（一）

主要使用：

CONCAT(string A/col, string B/col…)：返回输入字符串连接后的结果，支持任意个输入字符串;
CONCAT_WS(separator, str1, str2,…)：它是一个特殊形式的 CONCAT()。第一个参数是剩余其他参数间的分隔符。
COLLECT_SET(col)：函数只接受基本数据类型，它的主要作用是将某字段的值进行去重汇总，产生array类型字段。

示例

原始数据：

name	constellation	blood_type
宋江	白羊座	A
鲁智深	射手	A
武松	白羊座	B
潘金莲	白羊座	A
西门庆	射手	A

期望输出结果：

constell_blood	name_list
射手座,A	鲁智深&西门庆
白羊座,A	宋江&潘金莲
白羊座,B	武松

实现：

select
    t1.base,
    concat_ws('&', collect_set(t1.name)) name
from
    (select
        name,
        concat(constellation, ",", blood_type) constell_blood
    from
        person_info) t1
group by
    t1.constell_blood;

2、列转行（一）

主要使用
EXPLODE(col)：将hive一列中复杂的array或者map结构拆分成多行。
LATERAL VIEW
用法：LATERAL VIEW udtf(expression) tableAlias AS columnAlias
解释：用于和split, explode等UDTF一起使用，它能够将一列数据拆成多行数据，在此基础上可以对拆分后的数据进行聚合。

示例

原始数据：

movie	category
《疑犯追踪》	悬疑&动作
《Lie to me》	悬疑&警匪

期望输出结果：

movie	category_name
《疑犯追踪》	悬疑
《疑犯追踪》	动作
《Lie to me》	悬疑
《Lie to me》	警匪

实现：

select
    movie,
    category_name
from 
 movie_info 
lateral view explode(split(category, "\\&")) table_tmp as category_name;

3、行转列（二）

主要使用sum(case when )

示例

原始数据：

stu_id	name	course	score
01	zhangsan	math	90
01	zhangsan	chinese	88
01	zhangsan	english	88
02	lisi	math	66
02	lisi	chinese	77
02	lisi	english	80

期望输出结果：

stu_id	name	mat_score	chi_score	eng_score
01	zhangsan	90	88	88
02	lisi	66	77	80

实现1：使用sum if 或sum(case when )条件判断

select 
stu_id,
name,
sum(case when course='math' then score else 0 end) mat_score,
sum(case when course='chinese' then score else 0 end) chi_score,
sum(case when course='english' then score else 0 end) eng_score
from stu001 
group by stu_id,name

实现2：map的思想，先拼接成map的形式，再取下标

select 
stu_id
,name
,course_score_map['math'] as math_score
,course_score_map['chinese'] as math_score
,course_score_map['english'] as math_score
from 
(
select 
stu_id
,name
,str_to_map(concat_ws(',',collect_set(concat_ws(':',course,cast(score as String)))))course_score_map
from  stu001 
group by 
stu_id
,name
) tt

4、列转行（二）

示例

主要使用union all
原始数据：

stu_id	name	mat_score	chi_score	eng_score
01	zhangsan	90	88	88
02	lisi	66	77	80

期望输出结果：

stu_id	name	course	score
01	zhangsan	math	90
01	zhangsan	chinese	88
01	zhangsan	english	88
02	lisi	math	66
02	lisi	chinese	77
02	lisi	english	80

实现1：union all

select  
stu_id
,name
,'math' as course
,mat_score as score
from  stu002 

union all 
select  
stu_id
,name
,'chinese' as course
,chi_score as score
from  stu002 

union all 
select  
stu_id
,name
,'english' as course
,eng_score as score
from  stu002

实现2：炸裂再字符串切割

select 
stu_id
,name
,split(course_score,'\\:')[0] as course
,split(course_score,'\\:')[1] as score
from (
select  
stu_id
,name 
,concat('math',':',mat_score,'##','chinese',':',chi_score,'##','english',':',eng_score) as course_score_list
from stu002 
) tt 
lateral view explode(split(course_score_list, "\\##")) table_tmp as course_score;

GitCode 开源社区

旨在为数千万中国开发者提供一个无缝且高效的云端环境，以支持学习、使用和贡献开源项目。

更多推荐

[转载]在Windows环境下安装GNU Radio

转自：在Windows环境下安装GNURadio_恐弱智_新浪博客GNU Radio是用Python开发的，大部分开源的工程能够在Linux环境下运行良好，而Windows下却运行的很勉强，而且安装配置都很复杂。GNU Radio算是个例外了，不光提供了Windows的二进制安装，还有比较详细的说明。我是Python小白，所以折腾了好久才弄好，特意记录下来，免得以后再装还折腾。GNU Radio的

GitCode 开源社区

centOS 8 使用dnf安装Docker

DNF是什么？CentOS 8使用YUM软件包管理器版本v4.0.4。现在，该版本使用DNF(已删除YUM)。DNF是软件包管理器。它会在Linux发行版上安装，执行更新并删除软件包。使用DNF安装Docker跳过具有损坏依赖性的程序包一个有效的解决方案是使您的CentOS 8系统使用以下--nobest命令安装最符合条件的版本：sudo dnf install docker...

GitCode 开源社区

定时同步数据库表(mysql+linux+crontab)

sync.sh里面的参数需要改变，ip/username/password/database/tablesync.sh#!/bin/sh# Please change the IP and password of the data source db.# Then change the table name.filename=/home/nington/db/$(date +%Y-%m