Menu Barra

martes, 4 de octubre de 2011

Lentitud al reiniciar Servidor ESX/ESXi en Cluster y con maquinas virtuales con RDM siendo nodo pasivo demora al iniciar

Tras un acontecimiento en mi lugar de trabajo donde tuvimos que reiniciar un servidor ESX 4.1 con maquinas virtuales en clúster y con discos mapeados (RDM) me percate que el proceso de inicio de servicios del servidor demoraba demasiado, por ende dejo documentado esta incidencia donde el problema se origina directamente con el S.O. ESX.
El proceso que dejo se debe realizar a todos los servidores del clúster VMware para evitar problemas a futuro en caso que queramos crear más maquinas con las mismas condiciones, además de aplicar para las siguientes versiones:
  • ESX/ESXi 4.0
  • ESX/ESXi 4.1
  • ESX/ESXi 5.0
Síntomas
En un sistema ESX/ESXi con VM en Cluster MSCS, donde uno de los nodos del Cluster se encuentra en pasivo (independiente cual sea), tras el reinicio del servidor físico este demora alrededor de 10 a 30 minutos en encender.
Al bootear el servidor ESX/ESXi nos muestra una el siguiente mensaje:
Loading module multiextent
Posterior a esto comienza el inicio de los servicios asociados a nuestro Cluster VMware, el servicio que demora en levantar es el Storage-drivers … donde aquí comienza la reivindicación y la detección de los dispositivos SCSI, por ende posterior al booteo el servicio Starting Path Claiming and SCSI Device Discovery tratara e iniciara el servicio en un plazo aproximado de 30 minutos (como muestra imagen), dependiendo del número de maquinas virtuales que tengamos con RDM

Solución
A continuación dejo la solución extraída de los KB de VMware y su URL, donde nos brindan mayor detalle de la solución para cada plataforma y los vinculo correspondientes para descargar las actualizaciones para solucionar este inconveniente.
ESXi 5.0
ESXi 5.0 uses a different technique to determine if Raw Device Mapped (RDM)  LUNs are used for MSCS cluster devices, by introducing a configuration flag to mark each device as "perennially reserved" that is participating in a MSCS cluster. During a boot of an ESXi system the storage mid-layer attempts to discover all devices presented to an ESXi system during device claiming phase. However, MSCS LUNs that have a permanent SCSI reservation cause the boot process to elongate as the ESX cannot interrogate the LUN due to the persistent SCSI reservation placed on a device by an active MSCS Node hosted on another ESXi host.
Configuring the device to be perennially reserved is local to each ESXi system, and must be performed on every ESXi 5.0 system that has visibility to each device participating in a MSCS cluster. This improves the boot time for all ESXi hosts that have visibility to the device(s).
 There is no support to apply this setting using vSphere host profiles. As such, ESXi 5.0 systems deployed using vSphere Auto Deploy cannot take advantage of this feature. The advanced option Scsi.CRTimeoutDuringBoot is no longer valid on ESXi 5.0.
Note: advanced option Scsi.CRTimeoutDuringBoot is no longer valid on ESXi 5.0.


Upgrading to ESXi 5.0
- Prior to upgrading, unpresent all MSCS RDMs from the host
1.     Determine RDM LUNs that are part of a MSCS cluster.
2.     From the vSphere Client select a virtual machine that has a mapping to the MSCS cluster RDM devices.
3.     Edit your Virtual Machine settings and navigate to your Mapped RAW LUNs.
4.     Select Manage Paths to display the device properties of the Mapped RAW LUN and the device identifier (that is, the naa ID).
5.     Take note of the naa ID, which is a globally unique identifier for your shared device.
6.     Unpresent all MSCS RDMs devices from the hosts. 
- Upgrade the hosts to ESXi 5.0. See Methods of upgrading to ESXi 5.0 (2004501)
- Following reboot, use the esxcli command mark the device as "perennially reserved". (This will work even if the LUNs are not currently presented to the host).  

                      esxcli storage core device setconfig -d <naa.id> --perennially-reserved=true
-       Re-present the MSCS RDM devices to the hosts and rescan.
-       Rebooting hosts should not now have issues with MSCS devices.

Upgraded ESXi 5.0
To mark the MSCS LUNs as permanently reserved on an already upgraded ESXi 5.0 host, simply run the same esxcli command as above and all subsequent rescans/boots will be at normal speed.

1.     Determine the RDM LUNs that are part of a MSCS cluster.
2.     From the vSphere Client select a virtual machine that has a mapping to the MSCS cluster RDM devices.
3.     Edit your Virtual Machine settings and navigate to your Mapped RAW LUNs.
4.     Select Manage Paths to display the device properties of the Mapped RAW LUN and the device identifier (that is, the naa ID).
5.     Take note of the naa ID, which is a globally unique identifier for your shared device.
6.     Using the esxcli mark the device as "perennially reserved" with the command:
 esxcli storage core device setconfig -d <naa.id> --perennially-reserved=true 
7.     To verify if the device is perennially reserved, run the command:  

8.     Repeat the procedure for each Mapped RAW LUN that is participating in the MSCS cluster.  
Note: The configuration is permanently stored with the ESXi host and persists across reboots.
          To remove the perennially reserved flag, run the command:

      esxcli storage core device setconfig -d <naa.id> --perennially-reserved=false
  
PowerCLI 5.0
To mark the MSCS LUNs as permanently reserved through PowerCLI, esxcli functionality is available directly through the PowerCLI. The only thing you have to do is to retrieve an EsxCli instance and then invoke any of its methods. 
For additional information see VMware vSphere PowerCLI Blog.
Connect-VIServer -Server xxx.xxx.xxx.xxx  -User xxxxx -Pass xxxxx

-Set the EsxCli instance
$myesxcli= get-esxcli -VMHost ESXhost
- List the Devices.
$myesxcli.storage.core.device.list()   
- Determine the PowerCLI parameters.
$myesxcli.storage.core.device.setconfig
 TypeNameOfValue     : VMware.VimAutomation.ViCore.Util10Ps.EsxCliExtensionMethod
OverloadDefinitions : {void setconfig(boolean detached, string device, boolean perenniallyreserved)}
MemberType          : CodeMethod
Value               : void setconfig(boolean detached, string device, boolean perenniallyreserved)
Name                : setconfig
IsInstance          : True

- List details by device naa id.
$myesxcli.storage.core.device.list("naa.50060160c46036df50060160c46036df")
AttachedFilters        :
DevfsPath              : /vmfs/devices/disks/naa.50060160c46036df50060160c46036df
Device                 : naa.50060160c46036df50060160c46036df

IsPerenniallyReserved  : false IsPseudo               : true


- Set the device as "perennially reserved" with the command:
$myesxcli.storage.core.device.setconfig($false, "naa.50060160c46036df50060160c46036df", $true)
- Verify the parameter updates.
$myesxcli.storage.core.device.list("naa.50060160c46036df50060160c46036df")
AttachedFilters        :
DevfsPath              : /vmfs/devices/disks/naa.50060160c46036df50060160c46036df
Device                 : naa.50060160c46036df50060160c46036df

IsPerenniallyReserved  : true IsPseudo               : true 

- To remove the perennially reserved flag, run the command:
$myesxcli.storage.core.device.setconfig($false, "naa.50060160c46036df50060160c46036df", $false)

ESX/ESXi 4.x
 This issue is resolved by the VMware ESX/ESXi 4.1 patch released 2011-07-28:
To workaround this issue, modify an advanced configuration option on affected ESX/ESXi hosts to speed up the boot process. For more information on changing advanced configuration options, see Configuring advanced options for ESX/ESXi (1038578).
  • For ESX/ESXi 4.0: Change the advanced option Scsi.UWConflictRetries to 80.
  • For ESX/ESXi 4.1: Change the advanced option Scsi.CRTimeoutDuringBoot to 1.
Este post decargalo en PDF, aqui


Cualquier consulta que tengan, no duden en comentarla.
Atte,
Claudio Pérez Q.




No hay comentarios:

Publicar un comentario