三门峡做网站公司关键词代发排名
移动音乐平台故障-0216
第一部分:故障现象描述
2.15日下午15:20左右开始出现日志积压情况,经查看日志是由于T_PE2_ITNF_IBUSLOG表的入库速度大幅降低,导致整个日志线程的入库能力大幅降低,导致日志在内存中积压,最终会涨爆jvm内存,使应用服务器宕机,经人工干预(将内存高的机器从F5上下下来),坚持到19:30左右,由于业务高峰的慢慢到来,日志积压的速度越来越快,最终采取了清理jvm内存的操作,清理后恢复正常.
第二部分:故障排查
2.1:故障点连续statpack检查后,发现TOP等待时间又明显的问题
在数据库中出现了大量的Buffer busy wait,通过语句进行SQL抓取
select sysdate ,a.SQL_TEXT,s.MACHINE
from v$sqlarea a, v$session s
where a.HASH_VALUE = s.SQL_HASH_VALUE
and s.SID in (select sid
from v$session_wait w
where w.EVENT = 'buffer busy
waits')
发现大量的t_pe2_intf_ibuslog入库语句产生了Buffer busy wait等待,结合以前statpack对比
2.1.1 (statpack)
2月15日15:00-16:00statpack
2月6日15:00-16:00statpack
从上面两个对比,不难看出,这个简单的insert into的入库语句的性能产生了极大的变化。
第三部分:故障分析
一个简单的入库语句为什么会造成这么大的性能影响,那这些等待又是怎么发生的呢?我们首先来看看。。什么是Buffer busy wait Event
解释:
buffer busy waits常常是由于很频繁的insert ,需要重建,或者没有充足的回滚段引起的发生条件:block正被读入缓冲区或者缓冲区正被其他session使用, 当缓冲区以一种非共享方式或者如正在被读入到缓冲时,就会出现该等待.该值不应该大于1%
细想以后,发现还有个并发的可能的等待read by other session
因为10G以前,把一个等待事件read by other
session也划入到了buffer busy waits里,read by other
session的意思是,多个session并发请求相同的数据块,但因该数据块不在buffer_cache中而必须从磁盘读取,处理这种情况,oracle会只让其中一个sesion进行磁盘读取,此时其它session等待块从磁盘上读取进buffer_cache而抛出read by other
session等待事件。10G以后,read by other
session被单独了出来,buffer busy waits变的纯粹,就是上面我说的那种情况了。
再进一步查询相关视图
3.1
select * from v$waitstat
3.2
SELECT 'segment Header' CLASS, a.Segment_Type, a.Segment_Name,
a.Partition_Name
FROM Dba_Segments a, V$session_Wait
b
WHERE a.Header_File = b.P1
AND a.Header_Block = b.P2
AND b.Event = 'buffer busy waits'
UNION
SELECT 'freelist Groups' CLASS, a.Segment_Type, a.Segment_Name,
a.Partition_Name
FROM Dba_Segments a, V$session_Wait
b
WHERE b.P2 BETWEEN a.Header_Block + 1 AND (a.Header_Block +
a.Freelist_Groups)
AND a.Header_File = b.P1
AND a.Freelist_Groups
> 1
AND b.Event = 'buffer busy waits'
UNION
SELECT a.Segment_Type || ' Block' CLASS, a.Segment_Type, a.Segment_Name,
a.Partition_Name
FROM Dba_Extents a, V$session_Wait
b
WHERE b.P2 BETWEEN a.Block_Id AND a.Block_Id + a.Blocks -
1
AND a.File_Id = b.P1
AND b.Event = 'buffer busy waits'
AND NOT EXISTS (SELECT 1
FROM Dba_Segments
WHERE Header_File = b.P1
AND Header_Block = b.P2);
第四部分:问题结论
从目前收集到的信息来看,statspack中主要的等待事件是buffer busy wait,数据块发生严重的争用
根据Buffer wait Statistics for
DB这一部分的统计来看,发生争用的主要是data block和segment header,数据量与正常情况下相比要高出若干数量级
由于现场进行恢复操作后,我发的第二个SQL没有查到数据,无法准确列出争用涉及的具体对象名,但是根据逻辑IO较多SQL来推断应该跟t_pe2_intf_ibuslog表以及它的索引有关
对于data block的争用,如果在问题当时查看v$session_wait视图,应该可以看到P3字段的值很多应该是220,表明是大量并发的DML引起的争用,DML操作也就对应到t_pe2_intf_ibuslog表的大量insert操作。解决这种争用一般的办法是降低并发和分散数据。
对于segment header的争用,从发回的dba_segments视图看,t_pe2_intf_ibuslog表应该是手动段空间管理,所以默认freelist为1,像insert这样的操作都是要修改freelist的,这就产生了大量争用。解决的办法一是降低并发,二是增大freelist,再有就是考虑采用自动段空间管理Automatic Segment Space
Management ,这样就不会使用freelist了。
附录:
P1 = file# (Absolute File# in Oracle8
onwards)
P2 = block#
P3 = id (Reason Code)原因代码:A block is being read
===== 100 We want to NEW the block but the block is
currently being read by another session (most likely for
undo). 200 We want to NEW the block but someone else has is
using the current copy so we have to wait for them to
finish. 230 Trying to get a buffer in CR/CRX mode , but a
modification has started on the buffer that has not yet been
completed. - A modification is happening on
a SCUR or XCUR buffer, but has not yet
completed (dup.)
231 CR/CRX scan found the
CURRENT block, but a modification has started on the buffer that
has not yet been
completed. 130 Block is being read by another session and no
other suitable block image was found, so we wait until the read is
completed. This may also occur after a buffer cache assumed
deadlock. The kernel can't get a buffer in a certain amount of time
and assumes a deadlock. Therefor it will read the CR version of the
block. 110 We want the CURRENT block either shared or
exclusive but the Block is being read into cache by another
session, so we have to wait until their read() is
completed. (duplicate) 120 We
want to get the block in current mode but someone else is currently
reading it into the cache. Wait for them to complete the read. This
occurs during buffer
lookup. 210 The session wants the block in SCUR or XCUR
mode. If this is a buffer exchange or the session is in discrete TX
mode, the session waits for the first time and the second time
escalates the block as a deadlock and so does not show up as
waiting very long. In this case the statistic: "exchange deadlocks"
is incremented and we yield the CPU for the "buffer deadlock" wait
event. (duplicate) 220 During
buffer lookup for a CURRENT copy of a buffer we have found the
buffer but someone holds it in an incompatible mode so we have to
wait. 1.
SELECT kcbwhdes, why0+why1+why2 "Gets", "OTHER_WAIT"
FROM x$kcbsw s, x$kcbwh
w
WHERE s.indx=w.indx
and
s."OTHER_WAIT">0
ORDER BY 3
;
2. SELECT count, file#, name
FROM x$kcbfwait,
v$datafile
WHERE indx + 1 = file#
ORDER BY count
3.SELECT distinct owner, segment_name, segment_type
FROM dba_extents
WHERE file_id=
&FILE_ID
;
4.SELECT p1 "File", p2 "Block", p3 "Reason"
FROM v$session_wait
WHERE event='buffer busy
waits'
;相关解决办法详解================
This document discusses a rare and difficult to diagnose database
performance problem characterized by extremely high buffer busy waits that
occur at seemingly random times. The
problem persists even after traditional
buffer busy wait tuning practices are followed (typically, increasing the
number of freelists for an object). SCOPE & APPLICATION
-------------------
This document is intended for support analysts and
customers. It applies
to both Unix and Windows-based systems, although the examples here
will be particular to a Unix-based (Solaris) system.
In addition to addressing a specific buffer busy wait performance
problem, in section II, this document presents various techniques to
diagnose and resolve this problem by using detailed data from a real-world
example. The techniques illustrated here may be used to diagnose other I/O and
performance problems.
RESOLVING INTENSE AND "RANDOM" BUFFER BUSY WAIT PERFORMANCE
PROBLEMS
--------------------------------------------------------------------
This document is composed of two sections; a summary section that
broadly discusses the problem and its resolution, and a detailed
diagnostics section that shows how to collect and analyze various database and
operating system diagnostics related to this
problem. The detailed
diagnostics section is provided to help educate the reader with techniques that may be
useful in
other situations.
I. Summary
~~~~~~~~~~~
1. Problem Description
~~~~~~~~~~~~~~~~~~~~~~~
At seemingly random times without regard to overall load on the
database, the following symptoms may be witnessed:
- Slow response
times on an instance-wide level
- long wait
times for "buffer busy waits" in Bstat/Estat or Statpack
reports
- large numbers
of sessions waiting on buffer busy waits for a group
of objects
(identified in v$session_wait)
Some tuning effort may have been spent in identifying the
segments involved in the buffer busy waits and rebuilding those segments
with a higher number of freelists or freelist groups (from 8.1.6 on one can
dynamically add process freelists; segments only need to be rebuilt if changing
freelist groups). Even after adding
freelists, the problem continues and is
not diminished in any way (although regular, concurrency-based buffer
busy waits may be reduced).