当前位置:网站首页>一次备库的坏块的修复过程
一次备库的坏块的修复过程
2022-07-25 18:12:00 【文档搬运工】
参考文档:
Exadata - Database reports corrupted blocks errors where content of blocks have pattern BADFDAT (Doc ID 1579291.1)
ABMR: How to test Automatic Block Recover Feature (Doc ID 1266059.1)
https://oracledocs.blog.csdn.net/article/details/114386906
RDBMS 11.2.0.4 ,一套RAC主库,带多套RAC备库,其中一套RAC备库出现坏块
坏块记录,最严重的时候,坏块导致MRP进程停止,随后启动MRP进程,数据同步正常,但是此时没有考虑去查询坏块。
Reread of blocknum=2811006, file=+DATA_ABC/prod/datafile/AAAts_01.291.1107879227. found same corrupt data
Fri Jul 22 07:06:39 2022
Automatic block media recovery requested for (file# 94, block# 2810890)
Fri Jul 22 07:07:40 2022
Automatic block media recovery requested for (file# 94, block# 2810893)
Fri Jul 22 07:08:41 2022
Errors in file /u01/PROD/oracle/diag/rdbms/prod/PROD1/trace/PROD1_ora_289384.trc (incident=3912730):
Ÿå (æ–‡ä»¶å· 94, åå· 2810890)
åæ
ORA-01578: ORACLE æ•°æ
•å…ƒ: 9952
å
ORA-27616: ASM 分é…
文件 94: '+DATA_AAA/prod/datafile/AAAts_01.291.1107879227'
ORA-01110: æ•°æ
Incident details in: /u01/PROD/oracle/diag/rdbms/prod/PROD1/incident/incdir_3912730/PROD1_ora_289384_i3912730.trc
Fri Jul 22 07:08:42 2022
Dumping diagnostic data in directory=[cdmp_20220722070842], requested by (instance=1, osid=289384), summary=[incident=3912730].
Fri Jul 22 07:08:44 2022
Sweep [inc][3912730]: completed
Sweep [inc2][3912730]: completed
Sweep [inc2][3912729]: completed
Fri Jul 22 07:09:07 2022
Automatic block media recovery requested for (file# 94, block# 2810880)
Fri Jul 22 07:10:08 2022
Automatic block media recovery requested for (file# 94, block# 2810892)
Fri Jul 22 07:11:09 2022
Automatic block media recovery requested for (file# 94, block# 2810883)
Fri Jul 22 07:12:09 2022
Automatic block media recovery requested for (file# 94, block# 2810886)
Fri Jul 22 07:13:10 2022
Errors in file /u01/PROD/oracle/diag/rdbms/prod/PROD1/trace/PROD1_ora_289384.trc (incident=3912731):
Ÿå (æ–‡ä»¶å· 94, åå· 2810880)
åæ
ORA-01578: ORACLE æ•°æ
文件 94: '+DATA_AAA/prod/datafile/AAAts_01.291.1107879227'
Incident details in: /u01/PROD/oracle/diag/rdbms/prod/PROD1/incident/incdir_3912731/PROD1_ora_289384_i3912731.trc
Reread of blocknum=2810989, file=+DATA_AAA/prod/datafile/AAAts_01.291.1107879227. found same corrupt data
Reread of blocknum=2810989, file=+DATA_AAA/prod/datafile/AAAts_01.291.1107879227. found same corrupt data
Reread of blocknum=2810989, file=+DATA_AAA/prod/datafile/AAAts_01.291.1107879227. found same corrupt data
Reread of blocknum=2810989, file=+DATA_AAA/prod/datafile/AAAts_01.291.1107879227. found same corrupt data
Reread of blocknum=2810989, file=+DATA_AAA/prod/datafile/AAAts_01.291.1107879227. found same corrupt data
Hex dump of (file 94, block 2810992) in trace file /u01/PROD/oracle/diag/rdbms/prod/PROD1/incident/incdir_3912729/PROD1_m000_71496_i3912729_a.trc
Corrupt block relative dba: 0x17aae470 (file 94, block 2810992)
Bad header found during validation在ASM alert中出现的告警,这个告警,可以参考这个文档 :Exadata - Database reports corrupted blocks errors where content of blocks have pattern BADFDAT (Doc ID 1579291.1)
Thu Jul 07 08:10:31 2022
NOTE: starting media scrubbing repair for group 1/0x6fc765b6 (DATA_AAA) in R000
WARNING: group 1, file 285, extent 1983: filling extent with BADFDA7A
NOTE: completed media scrubbing repair for group 1/0x6fc765b6 (DATA_AAA) (relocated 1 extents)
Thu Jul 14 08:10:13 2022
NOTE: starting media scrubbing repair for group 1/0x6fc765b6 (DATA_AAA) in R000
WARNING: group 1, file 287, extent 3809: filling extent with BADFDA7A
NOTE: completed media scrubbing repair for group 1/0x6fc765b6 (DATA_AAA) (relocated 1 extents)
Thu Jul 21 08:09:10 2022
NOTE: starting media scrubbing repair for group 1/0x6fc765b6 (DATA_AAA) in R000
WARNING: group 1, file 283, extent 6766: filling extent with BADFDA7A
NOTE: completed media scrubbing repair for group 1/0x6fc765b6 (DATA_AAA) (relocated 1 extents)
受上面文档的启发,查看坏块,果然有坏块 ,后面发现,在没停止mrp进程的时候,修复坏块,居然还有别的文件的坏块生成。
[email protected]> select * from V$DATABASE_BLOCK_CORRUPTION;
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
94 2810880 98 0 CORRUPT
94 2810980 1 0 CORRUPT
94 2810992 1 0 CORRUPT
94 2811002 1 0 CORRUPT
48 1015296 50 0 CORRUPT
48 1015347 4 0 CORRUPT
48 1015352 13 0 CORRUPT
48 1015367 7 0 CORRUPT
48 1015375 49 0 CORRUPT
94 2810982 1 0 CORRUPT
94 2810984 1 0 CORRUPT
94 2810987 3 0 CORRUPT
94 2810998 1 0 CORRUPT
94 2811005 2 0 CORRUPT
14 rows selected.尝试修复坏块,失败,因为没有备份 ,此时也没有停掉mrp进程,结果花费了大量的时间,还没有成功。
RMAN> RECOVER CORRUPTION LIST;
Starting recover at 22-JUL-22
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=5272 instance=PROD1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=6024 instance=PROD1 device type=DISK
allocated channel: ORA_DISK_3
channel ORA_DISK_3: SID=8281 instance=PROD1 device type=DISK
finished primary search, recovered 1 blocks
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 07/22/2022 15:13:32
RMAN-06026: some targets not found - aborting restore
RMAN-06023: no backup or copy of datafile 94 found to restore
RMAN-06023: no backup or copy of datafile 48 found to restore同时查看alert log中的日志 ,会看到大量的 “Automatic block media recovery requested for”,同时会发现RECOVER CORRUPTION LIST很慢
Automatic block media recovery requested for (file# 48, block# 1015297)
Fri Jul 22 11:47:10 2022
Hex dump of (file 94, block 2811005) in trace file /u01/PROD/oracle/diag/rdbms/prod/PROD1/trace/PROD1_pr02_87305.trc
Corrupt block relative dba: 0x17aae47d (file 94, block 2811005)
Bad header found during media recovery
Data in bad block:
type: 122 format: 2 rdba: 0xbadfda7a
last change scn: 0xda7a.badfda7a seq: 0xdf flg: 0xba
spare1: 0xdf spare2: 0xba spare3: 0xbadf
consistency value in tail: 0xbadfda7a
check value in block header: 0xda7a
block checksum disabled
Reading datafile '+DATA_AAA/prod/datafile/AAAts_01.291.1107879227' for corruption at rdba: 0x17aae47d (file 94, block 2811005)
Reread (file 94, block 2811005) found same corrupt data (no logical check)
Automatic block media recovery requested for (file# 94, block# 2811005)
Automatic block media recovery successful for (file# 94, block# 2811005)
Fri Jul 22 11:47:46 2022
Automatic block media recovery requested for (file# 48, block# 1015298)
Fri Jul 22 11:48:47 2022
Automatic block media recovery requested for (file# 48, block# 1015299)
Fri Jul 22 11:49:47 2022
Automatic block media recovery requested for (file# 48, block# 1015300)
Fri Jul 22 11:50:47 2022
Automatic block media recovery requested for (file# 48, block# 1015301)
Fri Jul 22 11:51:47 2022
Automatic block media recovery requested for (file# 48, block# 1015303)
Fri Jul 22 11:52:47 2022
Automatic block media recovery requested for (file# 48, block# 1015304)
Fri Jul 22 11:53:48 2022
Automatic block media recovery requested for (file# 48, block# 1015305)查看提示个别自动修复成功,但是大部分都没有自动修复成功,因为备库上没有备份
Automatic block media recovery requested for (file# 90, block# 1950310)
Automatic block media recovery successful for (file# 90, block# 1950310)
Thu Jul 21 15:53:11 2022
Hex dump of (file 46, block 3464276) in trace file /u01/PROD/oracle/diag/rdbms/prod/PROD1/trace/PROD1_pr0e_87329.trc
Corrupt block relative dba: 0x0bb4dc54 (file 46, block 3464276)
Bad header found during media recovery
Data in bad block:
type: 122 format: 2 rdba: 0xbadfda7a
last change scn: 0xda7a.badfda7a seq: 0xdf flg: 0xba
spare1: 0xdf spare2: 0xba spare3: 0xbadf
consistency value in tail: 0xbadfda7a
check value in block header: 0xda7a
block checksum disabled
Reading datafile '+DATA_AAA/prod/datafile/AAAtsx.283.1107879225' for corruption at rdba: 0x0bb4dc54 (file 46, block 3464276)
Reread (file 46, block 3464276) found same corrupt data (no logical check)
Automatic block media recovery requested for (file# 46, block# 3464276)
Automatic block media recovery successful for (file# 46, block# 3464276)
Thu Jul 21 16:28:47 2022尝试恢复某一个块,成功了,但是很耗费时间(因为此时没有关闭MRP),使用的是7天前的全备的备份集进行恢复的
RMAN> recover datafile 94 block 2810982 2> ;
Starting recover at 22-JUL-22
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=2261 instance=PROD1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=3014 instance=PROD1 device type=DISK
allocated channel: ORA_DISK_3
channel ORA_DISK_3: SID=3767 instance=PROD1 device type=DISK
channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00094
channel ORA_DISK_1: reading from backup piece /mntbackupdata/rman/AAA_bak/PROD_incr0_116327_20220717
channel ORA_DISK_1: piece handle=/mntbackupdata/rman/AAA_bak/PROD_incr0_116327_20220717 tag=DB_INCR0_BAK
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 01:10:25
starting media recovery
media recovery failed
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 07/22/2022 16:38:51
ORA-00283: recovery session canceled due to errors
ORA-01153: an incompatible media recovery is active
RMAN>停止掉MRP,使用命令backup validate database 重新验证数据库中所有的坏块(过程略), 然后全部修复。
停掉mrp进程,重新恢复快块(相对于打开MRP的情况下,恢复块,很快)
RMAN> BLOCKRECOVER CORRUPTION LIST;
Starting recover at 23-JUL-22
using channel ORA_DISK_1
using channel ORA_DISK_2
using channel ORA_DISK_3
channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00048
channel ORA_DISK_1: reading from backup piece /mntbackupdata/rman/AAA_bak/PROD_incr0_116319_20220717
channel ORA_DISK_1: piece handle=/mntbackupdata/rman/AAA_bak/PROD_incr0_116319_20220717 tag=DB_INCR0_BAK
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:40:05
channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00090
channel ORA_DISK_1: reading from backup piece /mntbackupdata/rman/AAA_bak/PROD_incr0_116323_20220717
channel ORA_DISK_1: piece handle=/mntbackupdata/rman/AAA_bak/PROD_incr0_116323_20220717 tag=DB_INCR0_BAK
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 01:07:45
channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00094
channel ORA_DISK_1: reading from backup piece /mntbackupdata/rman/AAA_bak/PROD_incr0_116327_20220717
channel ORA_DISK_1: piece handle=/mntbackupdata/rman/AAA_bak/PROD_incr0_116327_20220717 tag=DB_INCR0_BAK
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 01:11:46
channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00046
channel ORA_DISK_1: reading from backup piece /mntbackupdata/rman/AAA_bak/PROD_incr0_116328_20220717
channel ORA_DISK_1: piece handle=/mntbackupdata/rman/AAA_bak/PROD_incr0_116328_20220717 tag=DB_INCR0_BAK
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 01:04:05
starting media recovery
media recovery failed
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 07/23/2022 04:53:40
ORA-00283: recovery session canceled due to errors
ORA-01122: database file 46 failed verification check
ORA-01110: data file 46: '+DATA_AAA/prod/datafile/AAAtsx.283.1107879225'
ORA-01207: file is more recent than control file - old control file
RMAN>--日志,alert log中没有 Automatic block media recovery requested for 之类的信息
-- 开启mrp进程,同步正常
-- 发现备库有gap,,以及主库的日志上有一下信息
Wed Jul 20 02:28:04 2022
Errors in file /u01/PROD/oracle/diag/rdbms/x5AAAdb/AAAdb2/trace/AAAdb2_nsa4_143985.trc:
ORA-00338: log 19 of thread 2 is more recent than control file
ORA-00312: online log 19 thread 2: '+DATA_mnt/x5AAAdb/onlinelog/group_19.842.1103275941'
ORA-00338: log 19 of thread 2 is more recent than control file
ORA-00312: online log 19 thread 2: '+DATA_mnt/x5AAAdb/onlinelog/group_19.848.1103275941'
Errors in file /u01/PROD/oracle/diag/rdbms/x5AAAdb/AAAdb2/trace/AAAdb2_nsa4_143985.trc:
ORA-00338: log 19 of thread 2 is more recent than control file
ORA-00312: online log 19 thread 2: '+DATA_mnt/x5AAAdb/onlinelog/group_19.842.1103275941'
ORA-00338: log 19 of thread 2 is more recent than control file
ORA-00312: online log 19 thread 2: '+DATA_mnt/x5AAAdb/onlinelog/group_19.848.1103275941'
Archived Log entry 72662 added for thread 2 sequence 128054 ID 0x1ea2d4be dest 1:
LNS: Standby redo logfile selected for thread 2 sequence 128055 for destination LOG_ARCHIVE_DEST_4
LNS: Standby redo logfile selected for thread 2 sequence 128055 for destination LOG_ARCHIVE_DEST_5
RFS[1]: Selected log 8 for thread 1 sequence 139132 dbid 123456789 branch 832687336
Sat Jul 23 00:16:52 2022
Archived Log entry 5819 added for thread 1 sequence 139131 ID 0x1ea2d4be dest 1:
Sat Jul 23 00:17:02 2022
Fetching gap sequence in thread 2, gap sequence 128365-128365
Sat Jul 23 00:19:01 2022
FAL[client]: Failed to request gap sequence
GAP - thread 2 sequence 128365-128365
DBID 123456789 branch 832687336
FAL[client]: All defined FAL servers have been attempted.原因:当前的归档日志已经到了128369,但是下一个归档的归档日志是128364 。同时,查看asm磁盘组上的归档日志发现只到128363,很明显,日志切换太频繁。控制文件中认为当前的归档日志是128369,
认为128364、128365已经生成了,实际上还没有生成。一段时间等待后,gap消失。archive log list正常 。
SQL> archive log list
Database log mode Archive Mode
Automatic archival Enabled
Archive destination +RECO_mnt
Oldest online log sequence 128364
Next log sequence to archive 128364
Current log sequence 128369
SQL>再次检查同步情况,坏块情况。坏块消失。
另一个问题。在使用srvctl 命令重启db的时候,关闭掉了db,但是打开的时候提示这个错误
[email protected]> startup
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+DATA_ABC/PROD/spfilePROD.ora'
ORA-17503: ksfdopn:10 Failed to open file +DATA_ABC/PROD/spfilePROD.ora
ORA-15056: additional error message
ORA-15173: entry 'spfileprod.ora' does not exist in directory 'prod'
ORA-06512: at line 4
[email protected]>参数文件没有了,查看asm磁盘组上,有一个参数文件,但是不敢确定,还有好几个控制文件。
如果是主库的话,直接自己造个pfile就启动了(只要直到数据库名),但是这个是备库,好多参数要设置,而且控制文件还好几个。
最终,在alert log中,查看数据库的启动过程,可以看到是使用那个spfile启动的。随后修改initPRODX.ora文件,启动正常。
查看initPROD1.ora中的参数,发现是这样的,initPRODX.ora被srvctl命令修改了 ?看下面的# line added by Agent 应该是被srvctl命令修改了。
[[email protected] dbs]$ more initPROD1.ora
SPFILE='+DATA_ABC/PROD/spfilePROD.ora' # line added by Agent
[[email protected] dbs]$
[[email protected] dbs]$ more initPROD2.ora
#SPFILE='+DATA_ABC/PROD/spfileprod.ora' # line added by Agent
#SPFILE='+data_ABC/prod/PARAMETERFILE/spfile.256.1107452321'
SPFILE='+DATA_ABC/PROD/spfilePROD.ora' # line added by Agent
[[email protected] dbs]$查看了下集群中db的配置,果然,集群中的spfile是 spfile<SID>.ora。 原来做备库的时候,恢复出来的是spfile.256.1107452321
[[email protected] ~]$ srvctl config database -d prod
Database unique name: PROD
Database name: PROD
Oracle home: /u01/PROD/oracle/product/11.2.0.4
Oracle user: oracle
Spfile: +DATA_ABC/PROD/spfilePROD.ora
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: PROD
Database instances: PROD1,PROD2
Disk Groups: DATA_ABC,RECO_ABC
Mount point paths:
Services:
Type: RAC
Database is administrator managed
[[email protected] ~]$
END。
边栏推荐
- Why is the index in [mysql] database implemented by b+ tree? Is hash table / red black tree /b tree feasible?
- Mock服务moco系列(二)- Json格式、File文件、Header、Cookie、解决中文乱码
- 推荐一个沁恒的蓝牙的参考博客
- Related operations of binary tree
- Update 3dcat real time cloud rendering V2.1.2 release
- Drawing PDF tables (I) drawing excel table styles in PDF and downloading them through iText (supporting Chinese fonts)
- "Jargon" | what kind of experience is it to efficiently deliver games with Devops?
- UnitTest框架应用
- NPDP多少分通过?如何高分通过?
- Mock service Moco series (I) - introduction, first demo, get request, post request
猜你喜欢

CVE-2022-33891 Apache spark shell 命令注入漏洞复现

Drawing PDF form (II) drawing excel form style in PDF through iText, setting Chinese font, watermark, logo, header and page number

Cloud VR: the next step of virtual reality specialization

STM8S003F3 内部flash调试

408 Chapter 2 linear table

imx6 RTL8189FTV移植

实时云渲染有哪些优势

tkinter GUI版通信录管理系统

Connect to MySQL using sqldeveloper

Pan domain name configuration method
随机推荐
MATLAB中join函数使用
Imx6 rtl8189ftv migration
图的相关操作
网易严选库存中心设计实践
Unittest framework application
mysql的小数number类型select之后丢失了前面的0
Basic operation of bidirectional linked list
"Jargon" | what kind of experience is it to efficiently deliver games with Devops?
“Deprecated Gradle features were used in this build, making it incompatible with Gradle 6.0”问题解决
Drawing PDF tables (I) drawing excel table styles in PDF and downloading them through iText (supporting Chinese fonts)
推荐一个沁恒的蓝牙的参考博客
文件基础知识
Why the future of digitalization depends on 3D real-time rendering
Linux启动mysql报错
Mock服务moco系列(一)- 简介、第一个Demo、Get请求、Post请求
Is there any inconspicuous but profitable sideline?
Joseph Ring problem
Mock service Moco series (III) - redirection, regular expression, delay, template, event, sub module design
Installation and operation instructions of SVN client (TortoiseSVN)
Three control methods of TestNG execution sequence