Hive中除了支持和传统数据库中一样的内关联、左关联、右关联、全关联,还支持LEFT SEMI JOIN和CROSS JOIN,但这两种JOIN类型也可以用前面的代替。
如何实现join?下面展示实操记录。
两张表的数据:
1)内关联(JOIN) 只返回能关联上的结果。
SELECT a.id, a.name,b.age FROM student a INNER JOIN studentssb b ON (a.id = b.id);
(INNER可以省略)
0: jdbc:hive2://hadoop1012:10000> SELECT a.id, a.name,b.age FROM student a INNER JOIN studentssb b ON (a.id = b.id);
+-------+------------+--------+
| a.id | a.name | b.age |
+-------+------------+--------+
| 1017 | student17 | 14 |
| 1018 | student18 | 18 |
| 1019 | student19 | 24 |
| 1001 | student1 | 12 |
| 1002 | student2 | 14 |
| 1003 | student3 | 18 |
| 1004 | student4 | 14 |
| 1005 | student5 | 18 |
| 1006 | student6 | 24 |
| 1007 | student7 | 15 |
| 1008 | student8 | 17 |
| 1009 | student9 | 18 |
| 1010 | student10 | 14 |
| 1011 | student11 | 12 |
| 1013 | student13 | 14 |
| 1014 | student14 | 15 |
| 1015 | student15 | 17 |
| 1016 | student16 | 18 |
+-------+------------+--------+
2) 左外关联(LEFT [OUTER] JOIN) 以LEFT [OUTER] JOIN关键字前面的表作为主表,和其他表进行关联,返回记录和主表的记录数一致,关联不上的字段置为NULL。是否指定OUTER关键字,貌似对查询结果无影响。
select a.id,a.name,b.age from student a left join studentssb b ON (a.id=b.id);
0: jdbc:hive2://hadoop1012:10000> select a.id,a.name,b.age from student a left join studentssb b ON (a.id=b.id);
+-------+------------+--------+
| a.id | a.name | b.age |
+-------+------------+--------+
| 1001 | student1 | 12 |
| 1002 | student2 | 14 |
| 1003 | student3 | 18 |
| 1004 | student4 | NULL |
| 1005 | student5 | NULL |
| 1006 | student6 | NULL |
| 1007 | student7 | NULL |
| 1008 | student8 | NULL |
| 1009 | student9 | NULL |
| 1010 | student10 | NULL |
| 1011 | student11 | NULL |
| 1012 | student12 | NULL |
| 1013 | student13 | NULL |
| 1014 | student14 | NULL |
| 1015 | student15 | NULL |
| 1016 | student16 | NULL |
+-------+------------+--------+
3)右外关联(RIGHT [OUTER] JOIN) 和左外关联相反,以RIGTH [OUTER] JOIN关键词后面的表作为主表,和前面的表做关联,返回记录数和主表一致,关联不上的字段为NULL。是否指定OUTER关键字,貌似对查询结果无影响。
SELECT a.id, a.name, b.age FROM student a RIGHT OUTER JOIN studentssb b ON (a.id = b.id);
0: jdbc:hive2://hadoop1012:10000> SELECT a.id, a.name, b.age FROM student a RIGHT OUTER JOIN studentssb b ON (a.id = b.id);
+-------+------------+--------+
| a.id | a.name | b.age |
+-------+------------+--------+
| 1001 | student1 | 12 |
| 1002 | student2 | 14 |
| 1003 | student3 | 18 |
| 1004 | student4 | 14 |
| 1005 | student5 | 18 |
| 1006 | student6 | 24 |
| 1007 | student7 | 15 |
| 1008 | student8 | 17 |
| 1009 | student9 | 18 |
| 1010 | student10 | 14 |
| 1011 | student11 | 12 |
| 1013 | student13 | 14 |
| 1014 | student14 | 15 |
| 1015 | student15 | 17 |
| 1016 | student16 | 18 |
| 1017 | student17 | 14 |
| 1018 | student18 | 18 |
| 1019 | student19 | 24 |
| NULL | NULL | 15 |
| NULL | NULL | 17 |
+-------+------------+--------+
4) 全外关联(FULL [OUTER] JOIN) 以两个表的记录为基准,返回两个表的记录去重之和,关联不上的字段为NULL。是否指定OUTER关键字,貌似对查询结果无影响。
注意:FULL JOIN时候,Hive不会使用MapJoin来优化。
SELECT a.id, a.name, b.age FROM student a FULL OUTER JOIN studentssb b ON (a.id = b.id);
0: jdbc:hive2://hadoop1012:10000> SELECT a.id, a.name, b.age FROM student a FULL OUTER JOIN studentssb b ON (a.id = b.id);
+-------+------------+--------+
| a.id | a.name | b.age |
+-------+------------+--------+
| 1001 | student1 | 12 |
| 1002 | student2 | 14 |
| 1003 | student3 | 18 |
| 1004 | student4 | 14 |
| 1005 | student5 | 18 |
| 1006 | student6 | 24 |
| 1007 | student7 | 15 |
| 1008 | student8 | 17 |
| 1009 | student9 | 18 |
| 1010 | student10 | 14 |
| 1011 | student11 | 12 |
| 1012 | student12 | NULL |
| 1013 | student13 | 14 |
| 1014 | student14 | 15 |
| 1015 | student15 | 17 |
| 1016 | student16 | 18 |
| 1017 | student17 | 14 |
| 1018 | student18 | 18 |
| 1019 | student19 | 24 |
| NULL | NULL | 15 |
| NULL | NULL | 17 |
| 1022 | student22 | NULL |
+-------+------------+--------+
5) LEFT SEMI JOIN 以LEFT SEMI JOIN关键字前面的表为主表,返回主表的KEY也在副表中的记录。
SELECT a.id, a.name FROM student a LEFT SEMI JOIN studentssb b ON (a.id = b.id);
0: jdbc:hive2://hadoop1012:10000> SELECT a.id, a.name FROM student a LEFT SEMI JOIN studentssb b ON (a.id = b.id);
+-------+------------+
| a.id | a.name |
+-------+------------+
| 1017 | student17 |
| 1018 | student18 |
| 1019 | student19 |
| 1001 | student1 |
| 1002 | student2 |
| 1003 | student3 |
| 1004 | student4 |
| 1005 | student5 |
| 1006 | student6 |
| 1007 | student7 |
| 1008 | student8 |
| 1009 | student9 |
| 1010 | student10 |
| 1011 | student11 |
| 1013 | student13 |
| 1014 | student14 |
| 1015 | student15 |
| 1016 | student16 |
+-------+------------+
6) 笛卡尔积关联(CROSS JOIN) 返回两个表的笛卡尔积结果,不需要指定关联键。
SELECT a.id, a.name, b.age FROM student a CROSS JOIN studentssb b;