|
Re:Double VIO, paths failing (1 viewing) (1) Guest
Favoured: 6
|
|
|
TOPIC: Re:Double VIO, paths failing
|
darkman (User)
Junior Boarder
Posts: 35
|
|
Double VIO, paths failing 7 Months, 2 Weeks ago
|
|
|
Hello,
I'm encountering the following problem. On a server, having rootvg only, paths are constantly
being marked as failed:
# lspath
Enabled hdisk0 vscsi1
Enabled hdisk1 vscsi1
Failed hdisk0 vscsi0
Failed hdisk1 vscsi0
This then returns back to normal and in a while is failing again.
There is double VIO configuration, the storage is DS5100.
I've checked the errpt of both VIOs, VIO2 has it's last error entry from
the 3rd of Sep. And VIO1 has current errors, several times per minute:
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
B6267342 1004164611 P H hdisk2 DISK OPERATION ERROR
B6267342 1004164611 P H hdisk3 DISK OPERATION ERROR
B6267342 1004164611 P H hdisk3 DISK OPERATION ERROR
B6267342 1004164611 P H hdisk2 DISK OPERATION ERROR
B6267342 1004164611 P H hdisk3 DISK OPERATION ERROR
B6267342 1004164611 P H hdisk2 DISK OPERATION ERROR
B6267342 1004164611 P H hdisk3 DISK OPERATION ERROR
B6267342 1004164611 P H hdisk2 DISK OPERATION ERROR
B6267342 1004164611 P H hdisk3 DISK OPERATION ERROR
B6267342 1004164611 P H hdisk2 DISK OPERATION ERROR
B6267342 1004164511 P H hdisk2 DISK OPERATION ERROR
B6267342 1004164511 P H hdisk3 DISK OPERATION ERROR
B6267342 1004164511 P H hdisk3 DISK OPERATION ERROR
B6267342 1004164511 P H hdisk2 DISK OPERATION ERROR
B6267342 1004164511 P H hdisk3 DISK OPERATION ERROR
B6267342 1004164511 P H hdisk2 DISK OPERATION ERROR
B6267342 1004164411 P H hdisk3 DISK OPERATION ERROR
B6267342 1004164411 P H hdisk2 DISK OPERATION ERROR
B6267342 1004164411 P H hdisk3 DISK OPERATION ERROR
B6267342 1004164411 P H hdisk2 DISK OPERATION ERROR
In detail, the errors look like this:
LABEL: SC_DISK_ERR5
IDENTIFIER: 00B984B3
Date/Time: Tue Oct 4 17:26:09 CUT 2011
Sequence Number: 980854
Machine Id: 00F6A5AA4C00
Node Id: sngp750vio1
Class: H
Type: UNKN
WPAR: Global
Resource Name: hdisk3
Resource Class:
Resource Type:
Location:
VPD:
Manufacturer................IBM
Machine Type and Model......1818 FAStT
ROS Level and ID............30373330
Serial Number...............
Device Specific.(Z0)........0000053245005032
Device Specific.(Z1)........
Description
UNDETERMINED ERROR
Probable Causes
DASD DEVICE
MEDIA
ADAPTER
Recommended Actions
FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
PATH ID
0
SENSE DATA
0A00 2A00 013B 6000 0000 0804 0000 0000 0000 0000 0000 0000 0000 0000 0000 1000
0000 1000 0600 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 001F 000C 0240 0000 0000 0000 0000 0000 0000 0000 0093 0000
0000 0038 0017
I'm suggesting that the connection to the storage is being lost, but my
very specific question is:
Is VIO2 no longer affected by that problem? And now only VIO1 is problematic?
Or an LPAR by design doesn't query the second VIO, unless the request to the
first VIO doesn't timeout or pass a specific error threshold (which in this
case obviously doesn't happen)? Is there something like a trunk priority, as
there is for SEA?
Thanks a lot in advance!
|
|
|
|
|
|
|
The administrator has disabled public write access.
|
Claus (Moderator)
Moderator
Posts: 29
|
|
Re:Double VIO, paths failing 7 Months, 2 Weeks ago
|
|
|
Hello,
with virtual scsi the MPIO driver works in failover mode only. The reason is to ensure the sequence of the packages arriving at the storage. The first path discovered by the system will be used. If the path fails the second path is used, even if the first path recoveres.
To change this behaviour two attributes are available:
1) The disk attribute hcheck_interval tells the system to check the pathes in a regular interval and to reintegrate recovered pathes.
# chdev -l hdiskx -a hcheck_interval=60
2) The path priority allows the administrator to select which path will be used, as long as it is available. If the primary path (lower priority value) recovers after an outage it will be used again. To define a path as backup use the following command:
# chpath -l hdiskx -p vscsiy -a priority=2
# lspath -El hdiskx -p vscsiy
priority 2 Priority True
Have a nice day
Claus
|
|
|
|
|
|
|
Last Edit: 2011/10/07 09:14 By Claus.
|
|
|
The administrator has disabled public write access.
|
Frank (Moderator)
Moderator
Posts: 109
|
|
Re:Double VIO, paths failing 7 Months ago
|
|
|
Normally this should not happen inside thae partition even if health checking is not configured. health checking will only help to fallback to the primary path path automatically if it has a higher priority. Path failover will work anyway but will only be triggered if the current path fails.
The erro messages you have submitted show some problems inside the VIO servers. What AIX Level (TL and SP) do you have installed? There may be a problem with the device driver. In addition, it might be advisable to setup a timeout for the vscsi client adapters because some errors are not correctly handled inside the VIO server partition. In this case. some errors on the VIO server will NOT LEAD to a path failover. Check your settings with lsattr -El vscsiX
Frank
|
|
|
|
|
|
|
The administrator has disabled public write access.
|
|
|
|
|
|
|