Login Form






Lost Password?
No account yet? Register

Who's Online

11 guests online
No Users Online

Syndicate

Home
DMS Timeout and Kernel Panic during Failover PDF Print
Written by Frank Schalude   
Tuesday, 30 March 2010
During some cluster tests at several customer sites, I detected a problem with the DMS. Crashing one node for a failover test lead to a DMS timeout on the surviving node (which of course resulted in a kernel panic and a crash of this node, too). The problem is related to a bug in the rsct fileset. This is the IBM APAR description for this problem:


IZ66741: SERIAL NETWORK THREAD MONITORING ERROR CAN CAUSE DMS TIMEOUT DURING HACMP FAILOVER

APAR statusClosed as program error.Error descriptionThere is a weakness in the thread liveness monitoring logic oftwo of the Topology Services modules which could cause thesubsystem to perceive the loss of a neighbor to be caused atleast in part by a local process hang.  As a result, thesubsystem will deliberately allow the DMS to expire to preventdata corruption caused by a sundered network. The affected network modules are Disk Heartbeating (whichincludes Multi-Node Disk Heartbeating) and rs232 (also knownas TTY). The thread monitoring error only exists at RSCT level 2.5.4.0(or higher) on AIX 6.1, or 2.4.12.0 (or higher) on AIX 5.3-- it should be noted that these levels are the minimumrequired to run PowerHA 6.1. This problem might be seen during failover tests when a nodeis halted or powered off, depending on the Failure DetectionRate (FDR) on the affected network, as well as the value of theDMS timeout.  It is somewhat more likely to be seen with theDisk Heartbeating network than with rs232.

 
Next >

Polls

Which AIX do you Use?
 
Copyright © 2008 www.isarlab.de