生活随笔
收集整理的这篇文章主要介绍了
一个通过添加本地分区索引提高SQL性能的案例
小编觉得挺不错的,现在分享给大家,帮大家做个参考.
今天接到同事求助,说有一个select query,在Oracle上要跑一分多钟,他希望能在5s内出结果,该sql如下:
[sql] view plain copy print ?
Select /*+ parallel(src, 8) */ distinct src.systemname as systemname , src.databasename as databasename , src.tablename as tablename , src.username as username from <strong>meta_dbql_table_usage_exp_hst</strong> src inner <strong> join DR_QRY_LOG_EXP_HST</strong> rl on <strong>src.acctstringdate = rl.acctstringdate and src.queryid = rl.queryid</strong> And Src.Systemname = Rl.Systemname and src.acctstringdate > sysdate - 30 And Rl.Acctstringdate > Sysdate - 30 inner join <strong>meta_dr_qry_log_tgt_all_hst </strong>tgt on upper (tgt.systemname) = upper ( 'MOZART' ) And Upper (tgt.Databasename) = Upper ( 'GDW_TABLES' ) And Upper (tgt.Tablename) = Upper ( 'SSA_SLNG_LSTG_MTRC_SD' ) <strong>AND src.acctstringdate = tgt.acctstringdate and rl.statement_id = tgt.statement_id</strong> and rl.systemname = tgt.systemname And Tgt.Acctstringdate > Sysdate - 30 And Not ( Upper (Tgt.Systemname)= Upper (src.systemname) And Upper (Tgt.Databasename) = Upper (Src.Databasename) And Upper (Tgt.Tablename) = Upper (Src.Tablename) ) And tgt.Systemname is not null And tgt.Databasename Is Not Null And tgt.tablename is not null ;
SQL的简单分析
总得来看,这个SQL就是三个表(meta_dbql_table_usage_exp_hst,DR_QRY_LOG_EXP_HST,meta_dr_qry_log_tgt_all_hst)的INNER JOIN,这三个表数据量都在百万级别,且都是分区表(以acctstringdate为分区键),执行计划如下:
[sql] view plain copy print ?
| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop | | 0 | SELECT STATEMENT | | 1 | 159 | 8654 | | | | 1 | PX COORDINATOR | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10002 | 1 | 159 | 8654 | | | | 3 | SORT UNIQUE | | 1 | 159 | 8654 | | | | 4 | PX RECEIVE | | 1 | 36 | 3 | | | | 5 | PX SEND HASH | :TQ10001 | 1 | 36 | 3 | | | |* 6 | TABLE ACCESS BY LOCAL INDEX ROWID| DR_QRY_LOG_EXP_HST | 1 | 36 | 3 | | | | 7 | NESTED LOOPS | | 1 | 159 | 8633 | | | | 8 | NESTED LOOPS | | 8959 | 1076K| 4900 | | | | 9 | BUFFER SORT | | | | | | | | 10 | PX RECEIVE | | | | | | | | 11 | PX SEND BROADCAST | :TQ10000 | | | | | | | 12 | PARTITION RANGE ITERATOR | | 1 | 56 | 4746 | KEY | 14 | |* 13 | TABLE ACCESS FULL | META_DR_QRY_LOG_TGT_ALL_HST | 1 | 56 | 4746 | KEY | 14 | | 14 | PX BLOCK ITERATOR | | 8959 | 586K| 154 | KEY | KEY | |* 15 | TABLE ACCESS FULL | META_DBQL_TABLE_USAGE_EXP_HST | 8959 | 586K| 154 | KEY | KEY | | 16 | PARTITION RANGE ITERATOR | | 1 | | 2 | KEY | KEY | |* 17 | INDEX RANGE SCAN | DR_QRY_LOG_EXP_HST_IDX | 1 | | 2 | KEY | KEY | Predicate Information (identified by operation id): 6 - filter("RL" . "STATEMENT_ID" = "TGT" . "STATEMENT_ID" AND "RL" . "SYSTEMNAME" = "TGT" . "SYSTEMNAME" AND "SRC" . "SYSTEMNAME" = "RL" . "SYSTEMNAME" ) 13 - filter(UPPER ( "TGT" . "SYSTEMNAME" )= 'MOZART' AND UPPER ( "TGT" . "DATABASENAME" )= 'GDW_TABLES' AND UPPER ( "TGT" . "TABLENAME" )= 'SSA_SLNG_LSTG_MTRC_SD' AND "TGT" . "ACCTSTRINGDATE" >SYSDATE@!-30 AND "TGT" . "SYSTEMNAME" IS NOT NULL "TGT" . "DATABASENAME" IS NOT NULL AND "TGT" . "TABLENAME" IS NOT NULL ) 15 - filter("SRC" . "ACCTSTRINGDATE" = "TGT" . "ACCTSTRINGDATE" AND ( UPPER ( "TGT" . "SYSTEMNAME" )<> UPPER ( "SRC" . "SYSTEMNAME" ) OR UPPER ( "TGT" . "DATABASENAME" )<> UPPER ( "SRC" . "DATABASENAME" ) OR UPPER ( "TGT" . "TABLENAME" )<> UPPER ( "SRC" . "TABLENAME" )) AND "SRC" . "ACCTSTRINGDATE" >SYSDATE@!-30) 17 - access("SRC" . "QUERYID" = "RL" . "QUERYID" AND "SRC" . "ACCTSTRINGDATE" = "RL" . "ACCTSTRINGDATE" ) filter("RL" . "ACCTSTRINGDATE" >SYSDATE@!-30)
定位问题
从上面执行计划中的表连接方式可以知道,这三个表之间进行了两次NESTED LOOP,问题出现在最里层的NESTED LOOP(对两个表都做了TABLE FULL SCAN),因为表都是百万级别的(即时过滤后的数据量也不小),性能问题就出现在内表(即被驱动表)META_DBQL_TABLE_USAGE_EXP_HST做了太多次的全表扫描。如果能把全表扫描转换成索引,则性能可以大幅度提高。
下面是NESTED LOOP的介绍:
嵌套连接把要处理的数据集分为外部循环(驱动数据源)和内部循环(被驱动数据源),外部循环只执行一次,内部循环执行的次数等于外部循环执行返回的数据个数。 这种连接的好处是内存使用非常少。 如果驱动数据源有限,且被驱动表在连接列上有相应的索引,则这种连接方式才是高效的。
下面是这三个表上索引的情况:
[sql] view plain copy print ?
SQL> select index_name, table_name from user_indexes where table_name in ( 'DR_QRY_LOG_EXP_HST' , upper ( 'meta_dbql_table_usage_exp_hst' ), upper ('meta_dr_qry_log_tgt_all_hs INDEX_NAME TABLE_NAME META_DR_QRY_LOG_TGT_ALL_IDX META_DR_QRY_LOG_TGT_ALL_HST META_DBQL_TUSAGE_EHST_IDX META_DBQL_TABLE_USAGE_EXP_HST DR_QRY_LOG_EXP_HST_IDX DR_QRY_LOG_EXP_HST CREATE INDEX "GV" . "META_DR_QRY_LOG_TGT_ALL_IDX" ON "GV" . "META_DR_QRY_LOG_TGT_ALL_HST" ( "STATEMENT_ID" , "ACCTSTRINGDATE" ) CREATE INDEX "GV" . "META_DBQL_TUSAGE_EHST_IDX" ON "GV" . "META_DBQL_TABLE_USAGE_EXP_HST" ( "QUERYID" , "ACCTSTRINGDATE" ) CREATE INDEX "GV" . "DR_QRY_LOG_EXP_HST_IDX" ON "GV" . "DR_QRY_LOG_EXP_HST" ( "QUERYID" , "ACCTSTRINGDATE" )
这三个索引都是本地分区索引(都包含分区键acctstringdate),很显然,DR_QRY_LOG_EXP_HST表少了个索引,因为它与表meta_dr_qry_log_tgt_all_hst 在statement_id上做join,因此应该在它的statement_id上也创建本地分区索引如下:
[sql] view plain copy print ?
create index DR_QRY_LOG_EXP_HST_IDX2 on gv.DR_QRY_LOG_EXP_HST (statement_id,ACCTSTRINGDATE) local ; 性能对比
新的执行计划如下:
[sql] view plain copy print ?
| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop | | 0 | SELECT STATEMENT | | 1 | 159 | 4838 | | | | 1 | SORT UNIQUE | | 1 | 159 | 4838 | | | |* 2 | TABLE ACCESS BY LOCAL INDEX ROWID | META_DBQL_TABLE_USAGE_EXP_HST | 1 | 67 | 3 | | | | 3 | NESTED LOOPS | | 1 | 159 | 4816 | | | | 4 | NESTED LOOPS | | 18 | 1656 | 4762 | | | | 5 | PARTITION RANGE ITERATOR | | 1 | 56 | 4746 | KEY | 14 | |* 6 | TABLE ACCESS FULL | META_DR_QRY_LOG_TGT_ALL_HST | 1 | 56 | 4746 | KEY | 14 | | 7 | PARTITION RANGE ITERATOR | | 18 | 648 | 16 | KEY | 14 | |* 8 | TABLE ACCESS BY LOCAL INDEX ROWID| DR_QRY_LOG_EXP_HST | 18 | 648 | 16 | KEY | 14 | |* 9 | <strong>INDEX RANGE SCAN | DR_QRY_LOG_EXP_HST_IDX2</strong> | 31 | | 15 | KEY | 14 | | 10 | PARTITION RANGE ITERATOR | | 1 | | 2 | KEY | KEY | |* 11 | INDEX RANGE SCAN | META_DBQL_TUSAGE_EHST_IDX | 1 | | 2 | KEY | KEY | Predicate Information (identified by operation id): 2 - filter((UPPER ( "TGT" . "SYSTEMNAME" )<> UPPER ( "SRC" . "SYSTEMNAME" ) OR UPPER ( "TGT" . "DATABASENAME" )<> UPPER ( "SRC" . "DATABASENAME" ) OR UPPER ( "TGT" . "TABLENAME" )<> UPPER ( "SRC" . "TABLENAME" )) AND "SRC" . "SYSTEMNAME" = "RL" . "SYSTEMNAME" ) 6 - filter(UPPER ( "TGT" . "SYSTEMNAME" )= 'MOZART' AND UPPER ( "TGT" . "DATABASENAME" )= 'GDW_TABLES' AND UPPER ( "TGT" . "TABLENAME" )= 'SSA_SLNG_LSTG_MTRC_SD' AND "TGT" . "ACCTSTRINGDATE" >SYSDATE@!-30 AND "TGT" . "SYSTEMNAME" IS NOT NULL AND "TGT" . "DATABASENAME" IS NOT NULL AND "TGT" . "TABLENAME" IS NOT NULL ) 8 - filter("RL" . "SYSTEMNAME" = "TGT" . "SYSTEMNAME" ) 9 - access("RL" . "STATEMENT_ID" = "TGT" . "STATEMENT_ID" AND "RL" . "ACCTSTRINGDATE" >SYSDATE@!-30 AND "RL" . "ACCTSTRINGDATE" IS NOT NULL ) 11 - access("SRC" . "QUERYID" = "RL" . "QUERYID" AND "SRC" . "ACCTSTRINGDATE" = "RL" . "ACCTSTRINGDATE" ) filter("SRC" . "ACCTSTRINGDATE" = "TGT" . "ACCTSTRINGDATE" AND "SRC" . "ACCTSTRINGDATE" >SYSDATE@!-30) 从新的的执行计划可以看出,它的第一个NESTED LOOP果然用了最新创建的索引。
下面是执行时间:
[plain] view plain copy print ?
已用时间: 00: 00: 02.16
两秒种搞定,远远超出他期望的5s :)
方法总结
NESTED LOOP高效的条件:
驱动数据源有限,且被驱动表在连接列上有相应的索引。
总结
以上是生活随笔 为你收集整理的一个通过添加本地分区索引提高SQL性能的案例 的全部内容,希望文章能够帮你解决所遇到的问题。
如果觉得生活随笔 网站内容还不错,欢迎将生活随笔 推荐给好友。