玖叶教程网

前端编程开发入门

Mysql连接方法之嵌套循环和块嵌套循环

1、嵌套循环-nested loop

当两个结果集做合并,需要找出两个结果集中都满足同一个或多个条件的数据时,就需要使用合适的方法来找出满足条件的行。

嵌套循环是集合连接中常见的一种连接方法。通常分为外部循环和内部循环,外部循环的集合称为驱动集合,内部循环的集合称为被驱动集合。

嵌套循环伪代码:

for each row in t1 matching range {  
    for each row in t2 matching reference key {  
        if row satisfies join conditions, send to client  
    }  
} 

工作过程:

  1. 优化器确定驱动集合并将其指定为外部循环。

外部循环生成一组用于驱动连接条件的行。集合可以是使用索引扫描、全表扫描或任何其他生成行的操作访问的表。

内部循环的迭代次数取决于外部循环中检索到的行数。例如,如果从外部表中检索了10行,则数据库必须在内部表中执行10次查找。如果从外部表检索了10,000,000行,那么数据库必须在内部表中执行10,000,000次查找。

  1. 优化器将另一个集合指定为内部循环。
  2. 对于每一个来自客户端的取回请求,基本流程如下:

1.从外层行源获取一行

2.探测内部行源以查找与谓词条件匹配的行

3.重复上述步骤,直到获取请求获得所有行

示例:

mysql> show indexes from emps;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| emps  |          0 | PRIMARY  |            1 | id          | A         |      299246 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
1 row in set (0.00 sec)

mysql> show indexes from salaries;
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table    | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| salaries |          0 | PRIMARY  |            1 | emp_no      | A         |      293263 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| salaries |          0 | PRIMARY  |            2 | from_date   | A         |     2838426 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
2 rows in set (0.02 sec)


mysql> desc format=tree select * from emps e join salaries s where e.id<=10 and e.emp_no=s.emp_no and s.to_date>'2001-06-22';
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                         |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop inner join  (cost=22.8 rows=32.3)
    -> Filter: (e.id <= 10)  (cost=3.02 rows=10)
        -> Index range scan on e using PRIMARY over (id <= 10)  (cost=3.02 rows=10)
    -> Filter: (s.to_date > DATE'2001-06-22')  (cost=1.04 rows=3.23)
        -> Index lookup on s using PRIMARY (emp_no=e.emp_no)  (cost=1.04 rows=9.68)
 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)


#上述查询的工作过程:
#1.表emps通过索引范围扫描获取满足条件id<=10的行形成一个集合  
#2.选择步骤1中的集合作为驱动集合,从集合中获取第一条记录  
#3.在表salaries上通过主键获取emp_no=上一步张的记录的emp_no的记录  
#4.如果s上还有其他条件,过滤其他条件
#5.将满足条件的行返回给客户端  

被驱动表salaries2的连接条件上没有索引的执行计划如下:

mysql> desc analyze select * from emps e join salaries2 s where e.id<=10 and e.emp_no=s.emp_no and s.to_date>'2001-06-22';
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop inner join  (cost=4.75e+6 rows=945810) (actual time=0.215..5971 rows=18 loops=1)
    -> Filter: (e.id <= 10)  (cost=3.02 rows=10) (actual time=0.0754..0.134 rows=10 loops=1)
        -> Index range scan on e using PRIMARY over (id <= 10)  (cost=3.02 rows=10) (actual time=0.0386..0.0875 rows=10 loops=1)
    -> Filter: ((s.emp_no = e.emp_no) and (s.to_date > DATE'2001-06-22'))  (cost=191778 rows=94581) (actual time=60.8..597 rows=1.8 loops=10)
        -> Table scan on s  (cost=191778 rows=2.84e+6) (actual time=0.0233..419 rows=2.84e+6 loops=10)
 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (5.97 sec)

#可以看到如果被驱动的表上连接字段没有索引,被驱动表会走全表扫描。  
#且全表扫描的次数等于驱动集合的记录数。

总结:

  1. 嵌套循环通常选择结果集小的那个表作为驱动表。
  2. 嵌套循环的被驱动集合的表上连接字段一定要有索引,否则会执行全表扫描。
  3. 被驱动集合的循环次数等于驱动集合的记录数。
  4. 嵌套循环通常适合于一大一小两个集合,且被驱动的集合的表的连接字段上有索引。

2、块嵌套循环-Block Nested-Loop

块嵌套循环是嵌套循环的一个变体。块嵌套循环在执行外层循环时,并不会取一行记录就去执行内存循环,而是取一行记录就会存放到join_buffer中,当join_buffer满了才开始执行内层循环。

内层循环取一条记录就会跟外层循环存放在join_buffer中的每一条记录做比较。当做完一轮比较后,清空join_buffer,执行外层循环的下一批操作,直到所有的记录都join完。

块嵌套循环大大减少了内层循环的次数。

伪代码:

for each row in t1 matching range {
    store used columns from t1 in join buffer
    if buffer is full {
        for each row in t2 matching reference key {
            if row satisfies join conditions, send to client
        }
        empty join buffer
    }
}

块嵌套循环的执行计划如下:

mysql> desc select * from emps e join sals s where e.id<=10 and e.emp_no=s.emp_no and s.to_date>'2001-06-22';
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows    | filtered | Extra                                              |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+----------------------------------------------------+
|  1 | SIMPLE      | e     | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL |      10 |   100.00 | Using where                                        |
|  1 | SIMPLE      | s     | NULL       | ALL   | NULL          | NULL    | NULL    | NULL | 2837430 |     3.33 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+----------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

join_buffer_size控制缓冲区的大小。每个连接都会分配一个join_buffer,一个查询中可能会分配多个join_buffer。

8.0.18以后块嵌套循环被hash join取代,之前用于块嵌套循环的场景会使用hash join。

块嵌套循环可以由优化器参数block_nested_loop控制开启和关闭。

mysql> set optimizer_switch='block_nested_loop=on|off';

发表评论:

控制面板
您好,欢迎到访网站!
  查看权限
网站分类
最新留言