Home arrow FORUM
Explico
Welcome, Guest
Please Login or Register.    Lost Password?
Re:Double VIO, paths failing (1 viewing) (1) Guest
Virtualization
Go to bottom Post Reply Favoured: 6
TOPIC: Re:Double VIO, paths failing
#480
darkman (User)
Junior Boarder
Posts: 35
graphgraph
User Offline Click here to see the profile of this user
Double VIO, paths failing 7 Months, 2 Weeks ago  
Hello,

I'm encountering the following problem. On a server, having rootvg only, paths are constantly
being marked as failed:

# lspath
Enabled hdisk0 vscsi1
Enabled hdisk1 vscsi1
Failed hdisk0 vscsi0
Failed hdisk1 vscsi0

This then returns back to normal and in a while is failing again.
There is double VIO configuration, the storage is DS5100.
I've checked the errpt of both VIOs, VIO2 has it's last error entry from
the 3rd of Sep. And VIO1 has current errors, several times per minute:

IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
B6267342 1004164611 P H hdisk2 DISK OPERATION ERROR
B6267342 1004164611 P H hdisk3 DISK OPERATION ERROR
B6267342 1004164611 P H hdisk3 DISK OPERATION ERROR
B6267342 1004164611 P H hdisk2 DISK OPERATION ERROR
B6267342 1004164611 P H hdisk3 DISK OPERATION ERROR
B6267342 1004164611 P H hdisk2 DISK OPERATION ERROR
B6267342 1004164611 P H hdisk3 DISK OPERATION ERROR
B6267342 1004164611 P H hdisk2 DISK OPERATION ERROR
B6267342 1004164611 P H hdisk3 DISK OPERATION ERROR
B6267342 1004164611 P H hdisk2 DISK OPERATION ERROR
B6267342 1004164511 P H hdisk2 DISK OPERATION ERROR
B6267342 1004164511 P H hdisk3 DISK OPERATION ERROR
B6267342 1004164511 P H hdisk3 DISK OPERATION ERROR
B6267342 1004164511 P H hdisk2 DISK OPERATION ERROR
B6267342 1004164511 P H hdisk3 DISK OPERATION ERROR
B6267342 1004164511 P H hdisk2 DISK OPERATION ERROR
B6267342 1004164411 P H hdisk3 DISK OPERATION ERROR
B6267342 1004164411 P H hdisk2 DISK OPERATION ERROR
B6267342 1004164411 P H hdisk3 DISK OPERATION ERROR
B6267342 1004164411 P H hdisk2 DISK OPERATION ERROR

In detail, the errors look like this:

LABEL: SC_DISK_ERR5
IDENTIFIER: 00B984B3

Date/Time: Tue Oct 4 17:26:09 CUT 2011
Sequence Number: 980854
Machine Id: 00F6A5AA4C00
Node Id: sngp750vio1
Class: H
Type: UNKN
WPAR: Global
Resource Name: hdisk3
Resource Class:
Resource Type:
Location:
VPD:
Manufacturer................IBM
Machine Type and Model......1818 FAStT
ROS Level and ID............30373330
Serial Number...............
Device Specific.(Z0)........0000053245005032
Device Specific.(Z1)........

Description
UNDETERMINED ERROR

Probable Causes
DASD DEVICE
MEDIA
ADAPTER

Recommended Actions
FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
PATH ID
0
SENSE DATA
0A00 2A00 013B 6000 0000 0804 0000 0000 0000 0000 0000 0000 0000 0000 0000 1000
0000 1000 0600 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 001F 000C 0240 0000 0000 0000 0000 0000 0000 0000 0093 0000
0000 0038 0017


I'm suggesting that the connection to the storage is being lost, but my
very specific question is:

Is VIO2 no longer affected by that problem? And now only VIO1 is problematic?
Or an LPAR by design doesn't query the second VIO, unless the request to the
first VIO doesn't timeout or pass a specific error threshold (which in this
case obviously doesn't happen)? Is there something like a trunk priority, as
there is for SEA?


Thanks a lot in advance!
 
Report to moderator   Logged Logged  
  The administrator has disabled public write access.
#481
Claus (Moderator)
Moderator
Posts: 29
graphgraph
User Offline Click here to see the profile of this user
Re:Double VIO, paths failing 7 Months, 2 Weeks ago  
Hello,

with virtual scsi the MPIO driver works in failover mode only. The reason is to ensure the sequence of the packages arriving at the storage. The first path discovered by the system will be used. If the path fails the second path is used, even if the first path recoveres.

To change this behaviour two attributes are available:

1) The disk attribute hcheck_interval tells the system to check the pathes in a regular interval and to reintegrate recovered pathes.

# chdev -l hdiskx -a hcheck_interval=60

2) The path priority allows the administrator to select which path will be used, as long as it is available. If the primary path (lower priority value) recovers after an outage it will be used again. To define a path as backup use the following command:

# chpath -l hdiskx -p vscsiy -a priority=2
# lspath -El hdiskx -p vscsiy
priority 2 Priority True

Have a nice day
Claus
 
Report to moderator   Logged Logged  
 
Last Edit: 2011/10/07 09:14 By Claus.
  The administrator has disabled public write access.
#482
Frank (Moderator)
Moderator
Posts: 109
graph
User Offline Click here to see the profile of this user
Re:Double VIO, paths failing 7 Months ago  
Normally this should not happen inside thae partition even if health checking is not configured. health checking will only help to fallback to the primary path path automatically if it has a higher priority. Path failover will work anyway but will only be triggered if the current path fails.

The erro messages you have submitted show some problems inside the VIO servers. What AIX Level (TL and SP) do you have installed? There may be a problem with the device driver. In addition, it might be advisable to setup a timeout for the vscsi client adapters because some errors are not correctly handled inside the VIO server partition. In this case. some errors on the VIO server will NOT LEAD to a path failover. Check your settings with lsattr -El vscsiX

Frank
 
Report to moderator   Logged Logged  
  The administrator has disabled public write access.
Go to top Post Reply
Powered by FireBoardget the latest posts directly to your desktop
Copyright © 2008 www.isarlab.de