分析11.2.0.3 rac CRS-1714:Unable to discover any voting files
生活随笔
收集整理的这篇文章主要介绍了
分析11.2.0.3 rac CRS-1714:Unable to discover any voting files
小编觉得挺不错的,现在分享给大家,帮大家做个参考.
结论:
1,11.2.0.3或者说ORACLE不同版本的RAC进程依赖机制一直在发展演化,一定要尽力搞清RAC各进程间依赖关系,到关重要2,CRS-1714:Unable to discover any voting files只是表面现象,并非真正是VOTING DISK损坏,具体需要你结合对应的LOG进行分析
3,如果RAC节点的GPNPD进程所用的配置文件PROFILE.XML(OLR),可能要重建损坏的节点
4,删除RAC节点以及添加节点,一定要详细查看官方手册,因为里面分类很多
5,最重要的一点,如果在分析LOG日志,卡住没思路或从未碰过类似问题,一定要查看MOS,搜索关键字,比如本案例的GPNP PROFILE
分析过程:
1,redhat 6.4上面的2节点11.2。0.4 RAC的CRSD进程没有启动,从集群ALERT日志发现,找不到表决磁盘2015-09-16 16:53:36.138
[cssd(25059)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/grid/11.2.0.4/log/jingfa1/cssd/ocssd.log
2015-09-16 16:53:51.176
2,运行如下命令关闭2个节点的所在ORACLE相关进程
/u01/grid/11.2.0.4/bin/crsctl stop crs
3,确认2个节点的ORACLE进程全部关闭
ps -ef|grep d.bin
root 1077 24425 0 09:00 pts/1 00:00:00 grep d.bin
4,在第1个节点以独占方式启动CRS
/u01/grid/11.2.0.4/bin/crsctl start crs -excl -nocrs
5,在第1个节点查看ASM进程是否启动
6,在第1个节点查看集群进程是否以独占方式启动
7,在第1个节点查看ocr磁盘是否工作正常
/u01/grid/11.2.0.4/bin/ocrcheck
8,如果ocr磁盘工作不正常,且其备份存在,可用备份恢复ocr磁盘
/u01/grid/11.2.0.4/bin/ocrconfig -showbackup
/u01/grid/11.2.0.4/bin/ocrconfig -restore ocr备份文件
9,在第1个节点以GRID用户查看OCR及VOTING DISK磁盘组是否存在,发现存在
1* select disk_number,path from v$asm_disk
SQL> /
DISK_NUMBER PATH
----------- --------------------------------------------------
0 /dev/ocr_vote
0 /dev/data
SQL>
SQL>
SQL> show parameter disk_
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups string DATA
asm_diskstring string /dev/*
SQL> select name,sector_size,block_size,allocation_unit_size/1024/1024 as au_mb from v$asm_diskgroup;
NAME SECTOR_SIZE BLOCK_SIZE AU_MB
------------------------------ ----------- ---------- ----------
DATA 512 4096 2
OCRVOTE 512 4096 2
10,在第1个节点确认VOTING DISK是否工作不正常,确实发现不了
/u01/grid/11.2.0.4/bin/crsctl query css votedisk
11,从上述第9步的asm_diskgroups发现,仅加载一个ASM磁盘组DATA,而没有加载OCRVOTE,所以调整其参数,让ASM实例启动时加载OCRVOTE及DATA磁盘组,这样
我想就可以在ASM实例启时自动加载VOTING DISK磁盘组了
alter system set asm_diskgroups=data,ocrvote sid='*';
show parameter disk_
12,关闭节点1的CRS集群相关进程
/u01/grid/11.2.0.4/bin/crsctl stop crs
13,重启2个节点的集群进程,确认crsd进程是否正常,发现问题依旧,还是找不到表决磁盘
/u01/grid/11.2.0.4/bin/crsctl start crs
14,关闭2个节点的集群进程,然后在节点1以独占方式启动集群进程
/u01/grid/11.2.0.4/bin/crsctl stop crs
/u01/grid/11.2.0.4/bin/crsctl start crs -excl -nocrs
15,在节点1直接替换ocrvote磁盘组,修复voting disk
/u01/grid/11.2.0.4/bin/crsctl replace votedisk +ocrvote
16,在节点1查看voting disk是否正常
/u01/grid/11.2.0.4/bin/crsctl query css votedisk
17,关闭节点的集群进程,然后在2节点重启集群进程
/u01/grid/11.2.0.4/bin/crsctl stop crs
/u01/grid/11.2.0.4/bin/crsctl start crs
18,在2个节点确认VOTING DISK是否可以正常工作(如下命令必须CRSD进程启动才有结果,否则为空,且CRSD进程是在集群所有进程最后一个启动),这下节点1正常了,但节点2还是CRSD进程启不来
/u01/grid/11.2.0.4/bin/crsctl query css votedisk
19,查看节点2的GRID用户的TRC文件,发现节点2的VOTING DISK的CLUSTER GUID标识和GPNP PROFILE不一致,所以最终节点2发现不了VOTING DISK
2015-09-16 17:58:51.847: [ CSSD][1851041536]clssnmvDiskVerify: discovered a potential voting file
2015-09-16 17:58:51.847: [ SKGFD][1851041536]Handle 0x7fd95808f980 from lib :UFS:: for disk :/dev/ocr_vote:
---这里GPNP进程发现VOTING DISK的GUID和CLUSTER GUID不相同
2015-09-16 17:58:51.965: [ CSSD][1851041536]clssnmvDiskCreate: Cluster guid 0acef774f25dcfb0bf3d0c7b3db02abe found in voting disk /dev/ocr_vote does not match with the
cluster guid 7d8026436ade6fe0ff597a0f6df497e1 obtained from the GPnP profile
--移除了VOTING DISK
2015-09-16 17:58:51.965: [ CSSD][1851041536]clssnmvDiskDestroy: removing the voting disk /dev/ocr_vote
2015-09-16 17:58:51.965: [ SKGFD][1851041536]Lib :UFS:: closing handle 0x7fd95808f980 for disk :/dev/ocr_vote:
--找不到VOTING DISK
2015-09-16 17:58:51.965: [ CSSD][1851041536]clssnmvDiskVerify: Successful discovery of 0 disks
2015-09-16 17:58:51.965: [ CSSD][1851041536]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2015-09-16 17:58:51.965: [ CSSD][1851041536]clssnmvFindInitialConfigs: No voting files found
2015-09-16 17:58:51.965: [ CSSD][1851041536](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds
21,我们在第2个节点看看GPNP进程是个什么东西
[grid@jingfa2 jingfa2]$ ps -ef|grep -i gpnp
grid 5238 32255 0 10:02 pts/1 00:00:00 grep -i gpnp
grid 18060 1 0 09:45 ? 00:00:01 /u01/grid/11.2.0.4/bin/gpnpd.bin
22,在第2个节点看看gpnp profile文件在哪儿
[grid@jingfa2 gpnpd]$ locate gpnp|grep -i --color profile
/u01/grid/11.2.0.4/gpnp/profiles
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/pending.xml
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.old
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml --我估计就是这个文件
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile_orig.xml
/u01/grid/11.2.0.4/gpnp/profiles/peer
/u01/grid/11.2.0.4/gpnp/profiles/peer/profile.xml
/u01/grid/11.2.0.4/gpnp/profiles/peer/profile_orig.xml
23,查看节点2gpnp profile文件的内容,从/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml文件,发现7d8026436ade6fe0ff597a0f6df497e1这个GUID,可见就是这个文件
同时我对比了节点1的这个文件,发现0acef774f25dcfb0bf3d0c7b3db02abe在此文件可以找到,所以我尝试手工更新GUID,用0acef774f25dcfb0bf3d0c7b3db02abe替换7d8026436ade6fe0ff597a0f6df497e1
0acef774f25dcfb0bf3d0c7b3db02abe
[grid@jingfa2 gpnpd]$ more /u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml|grep -i --color 7d8026436ade6fe0ff597a0f6df497e1
<?xml version="1.0" encoding="UTF-8"?><gpnp:GPnP-Profile Version="1.0" xmlns="http://www.grid-pnp.org/2005/11/gpnp-profile" xmlns:gpnp="http://www.grid-pnp.org/2005/11/gpnp-profile"
xmlns:orcl="http://www.oracle.com/gpnp/2005/11/gpnp-profile" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.grid-pnp.org/2005/11/gpnp-profile
gpnp-profile.xsd" ProfileSequence="7" ClusterUId="7d8026436ade6fe0ff597a0f6df497e1" ClusterName="jingfa-scan" PALocation=""><gpnp:Network-Profile><gpnp:HostNetwork id="gen"
HostName="*"><gpnp:Network id="net1" IP="192.168.0.0" Adapter="eth0" Use="public"/><gpnp:Network id="net2" IP="10.0.0.0" Adapter="eth1"
Use="cluster_interconnect"/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-Profile id="css" DiscoveryString="+asm"
LeaseDuration="400"/><orcl:ASM-Profile id="asm" DiscoveryString="/dev/ocr*"
SPFile="+OCRVOTE/jingfa-scan/asmparameterfile/registry.253.849167179"/><ds:Signature
xmlns:ds="http://www.w3.org/2000/09/xmldsig#"><ds:SignedInfo><ds:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/>
<ds:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/><ds:Reference URI=""><ds:Transforms>
<ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/><ds:Transform Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#">
<InclusiveNamespaces xmlns="http://www.w3.org/2001/10/xml-exc-c14n#" PrefixList="gpnp orcl xsi"/></ds:Transform></ds:Transforms>
<ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/><ds:DigestValue>cPtosOiD17nSId/92MTAPaQ+dLU=</ds:DigestValue></ds:Reference>
</ds:SignedInfo><ds:SignatureValue>Ca56sx6DgsCSxrRqPz2ReOzhkf9eYiqVYuj2XLadwuBURX2PL+nYD7LhLFFj27EpuSIx0SfGVhOPm/i016ws7tWATeSKBJDVyTAELgBEYPsMumW4vKm7rVXs
SbVJolycA3pFHtGqZ7FZjzSXxdj5Xq4LlBLGVWR3gYKnqxuRGv0=</ds:SignatureValue>
</ds:Signature></gpnp:GPnP-Profile>
[grid@jingfa2 gpnpd]$
24,调整文件前先备份节点2这个文件
cp /u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml /u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml.20150917bak
vi /u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml
:s/7d8026436ade6fe0ff597a0f6df497e1/0acef774f25dcfb0bf3d0c7b3db02abe/g
保存即可
25,在节点2重启集群进程,发现节点1的集群进程发生了重启,而且奇怪的是我24步改的又回以了原样,再次强行修改,再重启节点2集群进程
经过反复尝试,说明gpnp进程会对此文件进行恢复,即使你手工改了也没用
26,即使上面的方法行不通,换另一个方法,查查2个节点AGENT进程有何区别
[root@jingfa1 ~]# ps -ef|grep agent|grep grid|grep -v grep
grid 3647 1 0 09:44 ? 00:00:10 /u01/grid/11.2.0.4/bin/oraagent.bin
root 3660 1 0 09:44 ? 00:00:36 /u01/grid/11.2.0.4/bin/orarootagent.bin
grid 5793 1 0 09:45 ? 00:00:01 /u01/grid/11.2.0.4/bin/scriptagent.bin
oracle 5938 1 0 09:45 ? 00:00:20 /u01/grid/11.2.0.4/bin/oraagent.bin
grid 23427 1 0 09:43 ? 00:00:16 /u01/grid/11.2.0.4/bin/oraagent.bin
root 23656 1 0 09:43 ? 00:00:39 /u01/grid/11.2.0.4/bin/orarootagent.bin
root 23818 1 0 09:43 ? 00:00:19 /u01/grid/11.2.0.4/bin/cssdagent
[grid@jingfa2 ctssd]$ ps -ef|grep agent|grep grid|grep -v grep
root 17274 1 0 11:31 ? 00:00:01 /u01/grid/11.2.0.4/bin/cssdagent
grid 31975 1 0 11:21 ? 00:00:01 /u01/grid/11.2.0.4/bin/oraagent.bin
root 32064 1 0 11:21 ? 00:00:00 /u01/grid/11.2.0.4/bin/orarootagent.bin
[grid@jingfa2 ctssd]$
27,在BAIDU找到一篇文章,准备直接从节点1把profile.xml复制到节点2进行替换
--关闭2个节点的集群进程
/u01/grid/11.2.0.4/bin/crsctl stop crs
--备份节点2的PROFILE.XML
cd /u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/
cp profile.xml profile.xml.20150918bak
--从节点1把profile.xml复制到节点2进行替换
rm profile.xml
cd /u01/grid/11.2.0.4/gpnp/jingfa1/profiles/peer
scp profile.xml grid@192.168.0.31:/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer
28,启动节点2的集群进程,报错依旧
/u01/grid/11.2.0.4/bin/crsctl start crs
29,从节点2的gpnp.log可知,profile.xml是从本地的olr获知信息,我尝试把节点2本地的OLR删除,到时GPNP进程会从节点1获取PROFILE.XML
u01/grid/11.2.0.4/bin/ocrcheck -local
Status of Oracle Local Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2508
Available space (kbytes) : 259612
ID : 1774858304
Device/File Name : /u01/grid/11.2.0.4/cdata/jingfa2.olr
Device/File integrity check succeeded
Local registry integrity check succeeded
Logical corruption check succeeded
-- -备份节点2的OLR文件
cp /u01/grid/11.2.0.4/cdata/jingfa2.olr /u01/grid/11.2.0.4/cdata/jingfa2.olr.20150918bak
--删除节点2的olr文件
rm -rf /u01/grid/11.2.0.4/cdata/jingfa2.olr
/u01/grid/11.2.0.4/bin/crsctl stop crs
--启动节点2的集群进程
/u01/grid/11.2.0.4/bin/crsctl start crs
---直接移除节点2的OLR文件,发现节点2整个集群进程无法启动
[ohasd(6836)]CRS-0704:Oracle High Availability Service aborted due to Oracle Local Registry error [PROCL-26: Error while accessing the
physical storage Operating System error [No such file or directory] [2]]. Details at (:OHAS00106:) in /u01/grid/11.2.0.4/log/jingfa2/ohasd/ohasd.log.
30,从官方手册了解下OLR概念,发现每个RAC节点仅保存与此节点相关的集群资源信息,可见每个节点的OLR文件不同
Clusterware Administration and Deployment Guide
3 Managing Oracle Cluster Registry and Voting Disks
In Oracle Clusterware 11g release 2 (11.2), each node in a cluster has a local registry for node-specific resources, called an Oracle Local Registry (OLR),
that is installed and configured when Oracle Clusterware installs OCR
31,经查阅MOS,看来只能重建节点2了
Cluster guid found in voting disk does not match with the cluster guid obtained from the GPnP profile (文档 ID 1281791.1)
32,在节点1删除节点2,但报找不到节点2
[root@jingfa1 ~]# /u01/grid/11.2.0.4/bin/crsctl delete node -n jingfa2
CRS-4660: Could not find node jingfa2 to delete.
CRS-4000: Command Delete failed, or completed with errors.
33,在节点2更新要删除节点2的信息
[grid@jingfa2 ~]$ /u01/grid/11.2.0.4/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=jingfa2" crs=true -silent -local
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB. Actual 4094 MB Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /home/grid/oraInventory
'UpdateNodeList' was successful.
34,在节点2卸载集群的安装配置
[grid@jingfa2 ~]$ /u01/grid/11.2.0.4/deinstall/deinstall -local
Checking for required files and bootstrapping ...
Please wait ...
Location of logs /home/grid/oraInventory/logs/
############ ORACLE DEINSTALL & DECONFIG TOOL START ############
######################### CHECK OPERATION START #########################
## [START] Install check configuration ##
Checking for existence of the Oracle home location /u01/grid/11.2.0.4
Oracle Home type selected for deinstall is: Oracle Grid Infrastructure for a Cluster
Oracle Base selected for deinstall is: /u01/app/grid
Checking for existence of central inventory location /home/grid/oraInventory
Checking for existence of the Oracle Grid Infrastructure home /u01/grid/11.2.0.4
The following nodes are part of this cluster: jingfa2
Checking for sufficient temp space availability on node(s) : 'jingfa2'
## [END] Install check configuration ##
Traces log file: /home/grid/oraInventory/logs//crsdc.log
Enter an address or the name of the virtual IP used on node "jingfa2"[jingfa2-vip]
>
The following information can be collected by running "/sbin/ifconfig -a" on node "jingfa2"
Enter the IP netmask of Virtual IP "192.168.0.23" on node "jingfa2"[255.255.255.0]
>
Enter the network interface name on which the virtual IP address "192.168.0.23" is active
>
Enter an address or the name of the virtual IP[]
>
Network Configuration check config START
Network de-configuration trace file location: /home/grid/oraInventory/logs/netdc_check2015-09-19_08-30-44-AM.log
Specify all RAC listeners (do not include SCAN listener) that are to be de-configured [LISTENER,LISTENER_SCAN1]:
Network Configuration check config END
Asm Check Configuration START
ASM de-configuration trace file location: /home/grid/oraInventory/logs/asmcadc_check2015-09-19_08-31-13-AM.log
######################### CHECK OPERATION END #########################
####################### CHECK OPERATION SUMMARY #######################
Oracle Grid Infrastructure Home is: /u01/grid/11.2.0.4
The cluster node(s) on which the Oracle home deinstallation will be performed are:jingfa2
Since -local option has been specified, the Oracle home will be deinstalled only on the local node, 'jingfa2', and the global configuration will be removed.
Oracle Home selected for deinstall is: /u01/grid/11.2.0.4
Inventory Location where the Oracle home registered is: /home/grid/oraInventory
Following RAC listener(s) will be de-configured: LISTENER,LISTENER_SCAN1
Option -local will not modify any ASM configuration.
Do you want to continue (y - yes, n - no)? [n]: y
A log of this session will be written to: '/home/grid/oraInventory/logs/deinstall_deconfig2015-09-19_08-26-31-AM.out'
Any error messages from this session will be written to: '/home/grid/oraInventory/logs/deinstall_deconfig2015-09-19_08-26-31-AM.err'
######################## CLEAN OPERATION START ########################
ASM de-configuration trace file location: /home/grid/oraInventory/logs/asmcadc_clean2015-09-19_08-31-36-AM.log
ASM Clean Configuration END
Network Configuration clean config START
Network de-configuration trace file location: /home/grid/oraInventory/logs/netdc_clean2015-09-19_08-31-36-AM.log
De-configuring RAC listener(s): LISTENER,LISTENER_SCAN1
De-configuring listener: LISTENER
Stopping listener on node "jingfa2": LISTENER
Warning: Failed to stop listener. Listener may not be running.
Listener de-configured successfully.
De-configuring listener: LISTENER_SCAN1
Stopping listener on node "jingfa2": LISTENER_SCAN1
Warning: Failed to stop listener. Listener may not be running.
Listener de-configured successfully.
De-configuring backup files...
Backup files de-configured successfully.
The network configuration has been cleaned up successfully.
Network Configuration clean config END
---------------------------------------->
运行到这里,提示以ROOT用户在节点2运行如下脚本
The deconfig command below can be executed in parallel on all the remote nodes. Execute the command on the local node after the execution completes on all the remote nodes.
Run the following command as the root user or the administrator on node "jingfa2".
/tmp/deinstall2015-09-19_08-24-22AM/perl/bin/perl -I/tmp/deinstall2015-09-19_08-24-22AM/perl/lib -I/tmp/deinstall2015-09-19_08-24-22AM/crs/install /tmp/deinstall2015-09-19_08-24-22AM/crs/install/rootcrs.pl -force -deconfig -paramfile "/tmp/deinstall2015-09-19_08-24-22AM/response/deinstall_Ora11g_gridinfrahome1.rsp"
Press Enter after you finish running the above commands
<----------------------------------------
35,在节点2以ROOT用户运行上述提示的脚本
[root@jingfa2 ~]# /tmp/deinstall2015-09-19_08-24-22AM/perl/bin/perl -I/tmp/deinstall2015-09-19_08-24-22AM/perl/lib -I/tmp/deinstall2015-09-19_08-24-22AM/crs/install /tmp/deinstall2015-09-19_08-24-22AM/crs/install/rootcrs.pl -force -deconfig -paramfile "/tmp/deinstall2015-09-19_08-24-22AM/response/deinstall_Ora11g_gridinfrahome1.rsp"
Using configuration parameter file: /tmp/deinstall2015-09-19_08-24-22AM/response/deinstall_Ora11g_gridinfrahome1.rsp
PRCR-1119 : Failed to look up CRS resources of ora.cluster_vip_net1.type type
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.gsd is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.ons is registered
Cannot communicate with crsd
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'jingfa2'
CRS-2673: Attempting to stop 'ora.crf' on 'jingfa2'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'jingfa2'
CRS-2677: Stop of 'ora.crf' on 'jingfa2' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'jingfa2'
CRS-2677: Stop of 'ora.mdnsd' on 'jingfa2' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'jingfa2' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'jingfa2'
CRS-2677: Stop of 'ora.gpnpd' on 'jingfa2' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'jingfa2' has completed
CRS-4133: Oracle High Availability Services has been stopped.
Successfully deconfigured Oracle clusterware stack on this node
36,继续在节点2把34步没有运行的脚本运行完(点击回车键)
Remove the directory: /tmp/deinstall2015-09-19_08-24-22AM on node:
Setting the force flag to false
Setting the force flag to cleanup the Oracle Base
Oracle Universal Installer clean START
Detach Oracle home '/u01/grid/11.2.0.4' from the central inventory on the local node : Done
Delete directory '/u01/grid/11.2.0.4' on the local node : Done
Failed to delete the directory '/u01/app/grid'. The directory is in use.
Delete directory '/u01/app/grid' on the local node : Failed <<<<
Oracle Universal Installer cleanup completed with errors.
Oracle Universal Installer clean END
## [START] Oracle install clean ##
Clean install operation removing temporary directory '/tmp/deinstall2015-09-19_08-24-22AM' on node 'jingfa2'
## [END] Oracle install clean ##
######################### CLEAN OPERATION END #########################
####################### CLEAN OPERATION SUMMARY #######################
Following RAC listener(s) were de-configured successfully: LISTENER,LISTENER_SCAN1
Oracle Clusterware is stopped and successfully de-configured on node "jingfa2"
Oracle Clusterware is stopped and de-configured successfully.
Successfully detached Oracle home '/u01/grid/11.2.0.4' from the central inventory on the local node.
Successfully deleted directory '/u01/grid/11.2.0.4' on the local node.
Failed to delete directory '/u01/app/grid' on the local node.
Oracle Universal Installer cleanup completed with errors.
Oracle deinstall tool successfully cleaned up temporary directories.
#######################################################################
############# ORACLE DEINSTALL & DECONFIG TOOL END #############
[grid@jingfa2 ~]$
37,在节点1运行更新集群配置信息
[grid@jingfa1 ~]$ /u01/grid/11.2.0.4/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=jingfa1" crs=true -silent
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB. Actual 4094 MB Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /home/grid/oraInventory
'UpdateNodeList' was successful.
38,在节点1以ORACLE用户更新集群配置信息
[oracle@jingfa1 ~]$ /u01/app/oracle/product/11.2.0.4/db_1/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=jingfa1"
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB. Actual 4094 MB Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /home/grid/oraInventory
'UpdateNodeList' was successful.
39,确认节点2已经从集群中移除
[grid@jingfa1 ~]$ /u01/grid/11.2.0.4/bin/cluvfy stage -post nodedel -n jingfa2
Performing post-checks for node removal
Checking CRS integrity...
Clusterware version consistency passed
CRS integrity check passed
Node removal check passed
Post-check for node removal was successful.
40,准备在节点1开始添加节点2到集群,验证节点2是否可以添加到节点1,运行如下命令
/u01/grid/11.2.0.4/bin/cluvfy stage -pre nodeadd -n jingfa2
摄错请重新重成节点间的SSH互信即可
ERROR:
PRVF-7610 : Cannot verify user equivalence/reachability on existing cluster nodes
Verification cannot proceed
ERROR:
PRVF-7617 : Node connectivity between "jingfa1 : 192.168.0.21" and "jingfa1 : 192.168.0.22" failed
TCP connectivity check failed for subnet "192.168.0.0"
原因:Linux中未关闭Firewall
停掉防火墙即可:
service iptables save
service iptables stop
chkconfig iptables off
41,再次在节点1重新验证节点2是否可以验证到集群环境中,其实就是验证节点2的硬软件环境是否满足运行集群的环境
检查节点互信,安装软件包好多信息,NTP以及DNS和多播
[grid@jingfa1 ~]$ /u01/grid/11.2.0.4/bin/cluvfy stage -pre nodeadd -n jingfa2
Performing pre-checks for node addition
Checking node reachability...
Node reachability check passed from node "jingfa1"
Checking user equivalence...
User equivalence check passed for user "grid"
Checking node connectivity...
Checking hosts config file...
Verification of the hosts config file successful
Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"
TCP connectivity check passed for subnet "192.168.0.0"
Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "192.168.0.0".
Subnet mask consistency check passed.
Node connectivity check passed
Checking multicast communication...
Checking subnet "192.168.0.0" for multicast communication with multicast group "230.0.1.0"...
Check of subnet "192.168.0.0" for multicast communication with multicast group "230.0.1.0" passed.
Check of multicast communication passed.
Checking CRS integrity...
Clusterware version consistency passed
CRS integrity check passed
Checking shared resources...
Checking CRS home location...
"/u01/grid/11.2.0.4" is shared
Shared resources check for node addition passed
Checking node connectivity...
Checking hosts config file...
Verification of the hosts config file successful
Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"
TCP connectivity check passed for subnet "192.168.0.0"
Check: Node connectivity for interface "eth1"
Node connectivity passed for interface "eth1"
TCP connectivity check passed for subnet "10.0.0.0"
Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "192.168.0.0".
Subnet mask consistency check passed for subnet "10.0.0.0".
Subnet mask consistency check passed.
Node connectivity check passed
Checking multicast communication...
Checking subnet "192.168.0.0" for multicast communication with multicast group "230.0.1.0"...
Check of subnet "192.168.0.0" for multicast communication with multicast group "230.0.1.0" passed.
Checking subnet "10.0.0.0" for multicast communication with multicast group "230.0.1.0"...
Check of subnet "10.0.0.0" for multicast communication with multicast group "230.0.1.0" passed.
Check of multicast communication passed.
Total memory check passed
Available memory check passed
Swap space check passed
Free disk space check passed for "jingfa2:/u01/grid/11.2.0.4,jingfa2:/tmp"
Free disk space check failed for "jingfa1:/u01/grid/11.2.0.4,jingfa1:/tmp"
Check failed on nodes:
jingfa1
Check for multiple users with UID value 1101 passed
User existence check passed for "grid"
Run level check passed
Hard limits check passed for "maximum open file descriptors"
Soft limits check passed for "maximum open file descriptors"
Hard limits check passed for "maximum user processes"
Soft limits check passed for "maximum user processes"
System architecture check passed
Kernel version check passed
Kernel parameter check passed for "semmsl"
Kernel parameter check passed for "semmns"
Kernel parameter check passed for "semopm"
Kernel parameter check passed for "semmni"
Kernel parameter check passed for "shmmax"
Kernel parameter check passed for "shmmni"
Kernel parameter check passed for "shmall"
Kernel parameter check passed for "file-max"
Kernel parameter check passed for "ip_local_port_range"
Kernel parameter check passed for "rmem_default"
Kernel parameter check passed for "rmem_max"
Kernel parameter check passed for "wmem_default"
Kernel parameter check passed for "wmem_max"
Kernel parameter check passed for "aio-max-nr"
Package existence check passed for "make"
Package existence check passed for "binutils"
Package existence check passed for "gcc(x86_64)"
Package existence check passed for "libaio(x86_64)"
Package existence check passed for "glibc(x86_64)"
Package existence check passed for "compat-libstdc++-33(x86_64)"
Package existence check passed for "elfutils-libelf(x86_64)"
Package existence check passed for "elfutils-libelf-devel"
Package existence check passed for "glibc-common"
Package existence check passed for "glibc-devel(x86_64)"
Package existence check passed for "glibc-headers"
Package existence check passed for "gcc-c++(x86_64)"
Package existence check passed for "libaio-devel(x86_64)"
Package existence check passed for "libgcc(x86_64)"
Package existence check passed for "libstdc++(x86_64)"
Package existence check passed for "libstdc++-devel(x86_64)"
Package existence check passed for "sysstat"
Package existence check failed for "pdksh"
Check failed on nodes:
jingfa2,jingfa1
Package existence check passed for "expat(x86_64)"
Check for multiple users with UID value 0 passed
Current group ID check passed
Starting check for consistency of primary group of root user
Check for consistency of root user's primary group passed
Checking OCR integrity...
OCR integrity check passed
Checking Oracle Cluster Voting Disk configuration...
Oracle Cluster Voting Disk configuration check passed
Time zone consistency check passed
Starting Clock synchronization checks using Network Time Protocol(NTP)...
NTP Configuration file check started...
NTP Configuration file check passed
No NTP Daemons or Services were found to be running
PRVF-5507 : NTP daemon or service is not running on any node but NTP configuration file exists on the following node(s):
jingfa2,jingfa1
Clock synchronization check using Network Time Protocol(NTP) failed
User "grid" is not part of "root" group. Check passed
Checking consistency of file "/etc/resolv.conf" across nodes
File "/etc/resolv.conf" does not have both domain and search entries defined
domain entry in file "/etc/resolv.conf" is consistent across nodes
search entry in file "/etc/resolv.conf" is consistent across nodes
PRVF-5636 : The DNS response time for an unreachable node exceeded "15000" ms on following nodes: jingfa1,jingfa2
File "/etc/resolv.conf" is not consistent across nodes
Pre-check for node addition was unsuccessful on all the nodes.
[grid@jingfa1 ~]$
42,在节点1添加节点2
[grid@jingfa1 ~]$ /u01/grid/11.2.0.4/oui/bin/addNode.sh "CLUSTER_NEW_NODES={jingfa2}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={jingfa2-vip}"
类似内容略
ERROR:
PRVF-7617 : Node connectivity between "jingfa1 : 192.168.0.21" and "jingfa1 : 192.168.0.24" failed
ERROR:
PRVF-7617 : Node connectivity between "jingfa1 : 192.168.0.21" and "jingfa1 : 192.168.0.23" failed
TCP connectivity check failed for subnet "192.168.0.0"
Checking VIP configuration.
Checking VIP Subnet configuration.
Check for VIP Subnet configuration passed.
Checking VIP reachability
PRVF-10209 : VIPs "jingfa2-vip" are active before Clusterware installation
43,根据42步运行提示,停止节点2的VIP资源并移除并资源
[root@jingfa1 ~]# /u01/grid/11.2.0.4/bin/crsctl stop res ora.jingfa2.vip
CRS-2673: Attempting to stop 'ora.jingfa2.vip' on 'jingfa1'
CRS-2677: Stop of 'ora.jingfa2.vip' on 'jingfa1' succeeded
[root@jingfa1 ~]# /u01/grid/11.2.0.4/bin/crsctl delete res ora.jingfa2.vip
确认节点2的VIP资源已经清理完毕
[root@jingfa1 ~]# /u01/grid/11.2.0.4/bin/crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE jingfa1
无关内容略
ora.jingfa1.vip
1 ONLINE ONLINE jingfa1
ora.oc4j
1 ONLINE ONLINE jingfa1
ora.scan1.vip
1 ONLINE ONLINE jingfa1
44,忽略预安装前准备工作检查,开始在节点1添加节点2
[grid@jingfa1 ~]$ export IGNORE_PREADDNODE_CHECKS=Y
[grid@jingfa1 ~]$ /u01/grid/11.2.0.4/oui/bin/addNode.sh "CLUSTER_NEW_NODES={jingfa2}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={jingfa2-vip}"
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB. Actual 4067 MB Passed
Oracle Universal Installer, Version 11.2.0.3.0 Production
Copyright (C) 1999, 2011, Oracle. All rights reserved.
Performing tests to see whether nodes jingfa2 are available
............................................................... 100% Done.
.
-----------------------------------------------------------------------------
Cluster Node Addition Summary
Global Settings
Source: /u01/grid/11.2.0.4
New Nodes
Space Requirements
New Nodes
jingfa2
/: Required 5.01GB : Available 8.51GB
Installed Products
Product Names
Oracle Grid Infrastructure 11.2.0.3.0
Sun JDK 1.5.0.30.03
Installer SDK Component 11.2.0.3.0
类似内容略
SQL*Plus 11.2.0.3.0
Oracle Netca Client 11.2.0.3.0
Oracle Net 11.2.0.3.0
Oracle JVM 11.2.0.3.0
Oracle Internet Directory Client 11.2.0.3.0
Oracle Net Listener 11.2.0.3.0
Cluster Ready Services Files 11.2.0.3.0
Oracle Database 11g 11.2.0.3.0
-----------------------------------------------------------------------------
Instantiating scripts for add node (Saturday, September 19, 2015 10:53:28 AM GMT+08:00)
. 1% Done.
Instantiation of add node scripts complete
Copying to remote nodes (Saturday, September 19, 2015 10:53:33 AM GMT+08:00)
............................................................................................... 96% Done.
Home copied to new nodes
Saving inventory on nodes (Saturday, September 19, 2015 11:05:15 AM GMT+08:00)
. 100% Done.
Save inventory complete
WARNING:A new inventory has been created on one or more nodes in this session. However, it has not yet been registered as the central inventory of this system.
To register the new inventory please run the script at '/home/grid/oraInventory/orainstRoot.sh' with root privileges on nodes 'jingfa2'.
If you do not register the inventory, you may not be able to update or patch the products you installed.
The following configuration scripts need to be executed as the "root" user in each new cluster node. Each script in the list below is followed by a list of nodes.
/home/grid/oraInventory/orainstRoot.sh #On nodes jingfa2
/u01/grid/11.2.0.4/root.sh #On nodes jingfa2
To execute the configuration scripts:
1. Open a terminal window
2. Log in as "root"
3. Run the scripts in each cluster node
The Cluster Node Addition of /u01/grid/11.2.0.4 was successful.
Please check '/tmp/silentInstall.log' for more details.
[grid@jingfa1 ~]$
45,以ROOT用户在节点2运行44步提示的SH脚本
[root@jingfa2 ~]# /home/grid/oraInventory/orainstRoot.sh
Changing permissions of /home/grid/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.
Changing groupname of /home/grid/oraInventory to oinstall.
The execution of the script is complete.
[root@jingfa2 ~]# /u01/grid/11.2.0.4/root.sh
Performing root user operation for Oracle 11g
The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/grid/11.2.0.4
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.
Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/grid/11.2.0.4/crs/install/crsconfig_params
Creating trace directory
User ignored Prerequisites during installation
OLR initialization - successful
Adding Clusterware entries to upstart
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node jingfa1, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 11g Release 2.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded
[root@jingfa2 ~]#
个人简介
8年oracle从业经验,具备丰富的oracle技能,目前在国内北京某专业oracle服务公司从事高级技术顾问。 服务过的客户: 中国电信 中国移动 中国联通 中国电通 国家电网 四川达州商业银行 湖南老百姓大药房 山西省公安厅 中国邮政 北京302医院 河北廊坊新奥集团公司
项目经验: 中国电信3G项目AAA系统数据库部署及优化
中国联通4G数据库性能分析与优化 中国联通CRM数据库性能优化 中国移动10086电商平台数据库部署及优化 湖南老百姓大药房ERR数据库sql优化项目 四川达州商业银行TCBS核心业务系统数据库模型设计和RAC部署及优化 四川达州商业银行TCBS核心业务系统后端批处理存储过程功能模块编写及优化 北京高铁信号监控系统RAC数据库部署及优化 河南宇通客车数据库性能优化 中国电信电商平台核心采购模块表模型设计及优化 中国邮政储蓄系统数据库性能优化及sql优化 北京302医院数据库迁移实施 河北廊坊新奥data guard部署及优化 山西公安厅身份证审计数据库系统故障评估 国家电网上海灾备项目4 node rac+adg
贵州移动crm及客服数据库性能优化项目
贵州移动crm及客服务数据库sql审核项目
深圳穆迪软件有限公司数据库性能优化项目
联系方式: 手机:18201115468 qq : 305076427 qq微博: wisdomone1 新浪微博:wisdomone9 qq群:275813900 itpub博客名称:wisdomone1 http://blog.itpub.net/9240380/
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/9240380/viewspace-1804149/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/9240380/viewspace-1804149/
总结
以上是生活随笔为你收集整理的分析11.2.0.3 rac CRS-1714:Unable to discover any voting files的全部内容,希望文章能够帮你解决所遇到的问题。
- 上一篇: python渐变色代码_使用python
- 下一篇: JDK与JER的区别