hot block的诊断案例

王朝other·作者佚名  2008-05-31
宽屏版  字体: |||超大  

诊断因为inefficent SQL导致严重Cache buffers chain / Cache buffer lru chain Latch,引起的CPU紧张和用户Batch缓慢

数据库817,有人叫batch太慢了,开始检查。

CPU使用率一直很高,且有进程IO等待,但iostat没有显示有wt

vmstat 1 10

procs memory page disk faults cpu

r b w swap free re mf pi po fr de sr s0 s1 s2 s3 in sy cs us sy id

3 7 0 39824280 41916960 0 0 0 0 0 0 0 0 6 6 6 4294967196 0 0 -810 -1696 -1165

21 28 0 39444944 40480648 3093 12 16512 0 0 0 0 0 1 5 3 65720 205520 53041 70 29 0

28 6 0 39444944 40480784 3307 4 14392 0 0 0 0 0 3 3 3 58991 161877 43637 74 26 0

25 115 0 39445008 40480696 1776 13 16032 0 0 0 0 0 0 1 2 53622 170834 40450 71 29 0

46 10 0 39446648 40482024 1054 2 12976 0 0 0 0 0 2 3 2 49978 163920 36885 72 28 0

17 10 0 39446648 40482000 1366 7 14664 0 0 0 0 0 0 0 2 44678 187017 36723 71 28 0

14 5 0 39445952 40481536 1142 696 9904 0 0 0 0 0 5 2 0 54537 109587 30931 71 29 0

46 121 0 39445376 40480744 875 26 11464 0 0 0 0 0 3 5 3 75939 197783 44311 69 31 0

40 43 0 39451008 40485880 1364 1384 14368 0 0 0 0 0 2 5 4 75546 180180 47494 70 30 0

16 15 0 39450232 40485128 1782 994 18408 0 0 0 0 0 0 3 1 61110 138910 38253 72 28 0

iostat 1 5

tty sd0 sd1 sd2 sd3 cpu

tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id

23 1227 0 0 0 105 6 7 105 6 7 103 6 7 45 95 21 44

0 234 0 0 0 0 0 0 8 1 1 8 1 11 71 29 0 0

0 80 0 0 0 32 4 8 32 4 10 8 1 3 68 32 0 0

0 80 0 0 0 0 0 0 24 3 12 8 1 9 70 30 0 0

0 80 0 0 0 8 1 7 16 2 1 8 1 6 69 31 0 0

使用top/prstat,没有发现使用非凡多cpu资源的进程

察看数据库是否有并行进程,v$px_session返回空行。

检查v$session_wait,发现很多latch free,怀疑latch free导致cpu 紧张,db file sequential/scattered read导致有进程被IO阻塞。

EVENT COUNT(*)

---------------------------------------------------------------- ---------

PL/SQL lock timer 21

SQL*Net message from client 362

SQL*Net message to client 2

SQL*Net more data to client 2

db file scattered read 6

db file sequential read 25

latch free 10

log file sync 3

pmon timer 1

rdbms ipc message 6

smon timer 1

察看是那些latch,发现很多cache buffers chains,cache buffers lru chain怀疑不高效率的SQL导致hot block

SQL> select v$session.sid,name latch_name from

v$session,v$latch,v$session_wait s where v$session.sid=s.sid and s.event='latch free' and s.p2=v$latch.latch#;

SID LATCH_NAME

---------- ---------------------------------------------------------------

181 cache buffers chains

287 cache buffers chains

416 cache buffers lru chain

485 cache buffers lru chain

502 cache buffers lru chain

586 cache buffers lru chain

想察看哪个segment引起hot block, 但如下sql 几分钟内无返回,故cancel。

Select distinct owner,segment_name,partition_name,segment_type from dba_extents a ,

(Select obj, dbarfil, dbablk from x$bh where Hladdr in (Select addr from

v$latch_children where name='cache buffers chains' and addr in (select p1raw from v$session_wait

where event='latch free') )) b where

b.dbarfil=a. RELATIVE_FNO and b.dbablk >= a.block_id and b.dbablk<( a.block_id+a.blocks);

跟踪进程,察看进程执行什么SQL

********************************************************************************

SELECT SEVT.SCHD_EV_ID,ATEV.ROW_SEQ

FROM

FT_T_SEVT SEVT,FT_T_ATEV ATEV WHERE SEVT.SCHD_POST_TMS<=:b1 AND SEVT.ACTG_TRN_EV_ID=ATEV.ACTG_TRN_EV_ID AND SEVT.ACCT_GRP_OID=:b2 AND SEVT.ACCT_ID BETWEEN:b3 AND:b4 ORDER BY ATEV.ROW_SEQ

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 0 0.00 0.00 0 0 0 0

Execute 20 0.00 0.00 0 0 0 0

Fetch 19 57.42 90.34 461156 517860 3344 0

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 39 57.42 90.34 461156 517860 3344 0

********************************************************************************

SQL> @$Oracle_HOME/rdbms/admin/utlXPls

Plan Table

--------------------------------------------------------------------------------

Operation Name Rows Bytes Cost Pstart Pstop

--------------------------------------------------------------------------------

SELECT STATEMENT 20 1K 1724

SORT ORDER BY 20 1K 1724

NESTED LOOPS 20 1K 1722

TABLE Access FULL FT_T_SEVT 20 1K 1662

TABLE ACCESS BY INDEX RFT_T_ATEV 415M 6G 3

INDEX UNIQUE SCAN FT_T_ATEV 415M 2

--------------------------------------------------------------------------------

9 rows selected.

明显表FT_T_SEVT 缺少索引,导致full table scan. 还trace了其他几个session,发现也是执行如上同样的SQL语句。检查v$sql

SQL>selectusers_opening,EXECUTIONS,BUFFER_GETS,DISK_READS from v$sql where address='B83AC5F0';

USERS_OPENING

USERS_OPENING 表示有60个session在执行同样的SQL语句,导致表FT_T_SEVT 非常热点,严重的cache buffer chain/cache buffer lru chain 使CPU使用率较高;Full table scan也给IO带来等待。

EXECUTIONS BUFFER_GETS DISK_READS 60 49784 -1.604E+09 1963650435

 
 
 
免责声明:本文为网络用户发布,其观点仅代表作者个人观点,与本站无关,本站仅提供信息存储服务。文中陈述内容未经本站证实,其真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。
© 2005- 王朝网络 版权所有