玖叶教程网

前端编程开发入门

Hive中的排序函数row_number()、rank()

Hive 0.11.0以后增加了分析函数row_number()、rank()、dense_rank(),说明:

row_number() over ([partition col1] [order by col2])

rank() over ([partition col1] [order by col2])

dense_rank() over ([partition col1] [order by col2])

三个函数都是根据col1字段分组,然后对col2字段进行排序,对排序后的每行生成一个行号,这个行号从1开始递增col1、col2都可以是多个字段,用','分隔

三个函数的区别是:

row_number() 是没有重复值的排序(即使记录相等也是不重复的),可以利用它来实现分页; dense_rank() 是连续排序,排名相等会在名次中不会留下空位,两个第二名仍然跟着第三名;rank()是跳跃排序,排名相等会在名次中留下空位,如两个第二名下来就是第四名。

实例:

  1. hive> create table t(name string, sub string, score int) row format delimited fields terminated by ‘\t‘;

数据在a.txt文件中

  1. a chinese 98

  2. a english 90

  3. d chinese 88

  4. c english 82

  5. c math 98

  6. b math 89

  7. b chinese 79

  8. z english 90

  9. z math 89

  10. z chinese 80

  11. e math 99

  12. e english 87

  13. d english 90

1row_number

  1. hive (test)> select *, row_number() over (partition by sub order by score) as od from t;

2rank

  1. hive (test)> select *, rank() over (partition by sub order by score) as od from t;

3dense_ran

  1. hive (test)> select *, dense_rank() over (partition by sub order by score desc) od from t;

业务实例:

统计每个学科的前三名

hive> select * from (select *, row_number() over (partition by sub order by score desc) as od from t ) tt where od<=3;

语文成绩是80分的排名是多少

hive> select od from (select *, row_number() over (partition by sub order by score desc) as od from t ) t where sub=‘chinese‘ and score=80;

分页查询

hive> select * from (select *, row_number() over () as rn from t) t1 where rn between 1 and 5;

select *, row_number() over () as rn from t;

发表评论:

控制面板
您好,欢迎到访网站!
  查看权限
网站分类
最新留言