Enabling Verbose Logging on Linux with Emulex Host Bus Adapters

Where did my disks go?

So no and then you may run into an issue which cannot be explained properly by just looking at the standard events that show up in “/var/log/messages“.

Issues such as

Oct 7 18:24:20 centos8 kernel: lpfc 0000:81:00.0: 0:1305 Link Down Event xc received Data: xc x20 x800110 x0 x0
Oct 7 18:24:24 centos8 kernel: rport-11:0-4: blocked FC remote port time out: removing target and saving binding
Oct 7 18:24:24 centos8 kernel: lpfc 0000:81:00.0: 0:(0):0203 Devloss timeout on WWPN 50:06:0e:80:07:c3:70:00 NPort x01ee40 Data: x0 x8 x2

are fairly common and the above simply shows a Link Down event. These are the most easy to troubleshoot when you remote switchlog tell you

18:26:59.565715 SCN Port Offline;rsn=0x10004,g=0x12 A2,P0 A2,P0 93 NA 
18:26:59.565721 *Removing all nodes from port A2,P0 A2,P0 93 NA 
18:28:07.998318 SCN LR_PORT(0);g=0x12 A2,P0 A2,P0 93 NA 
18:28:08.006029 SCN Port Online; g=0x12,isolated=0 A2,P0 A2,P1 93 NA 
18:28:08.007307 Port Elp engaged A2,P1 A2,P0 93 NA 
18:28:08.007331 *Removing all nodes from port A2,P0 A2,P0 93 NA 
18:28:08.007594 SCN Port F_PORT A2,P1 A2,P0 93 NA 
18:28:08.099107 SCN LR_PORT(0);g=0x12 LR_IN A2,P0 A2,P0 93 NA 
18:28:20.669283 SCN Port Offline;rsn=0x10004,g=0x14 A2,P0 A2,P0 93 NA 
18:28:20.669288 *Removing all nodes from port A2,P0 A2,P0 93 NA

as a result of

Wed Oct 7 18:28:07 2020 admin, FID 43, 10.75.27.192, portenable 4/29
Wed Oct 7 18:28:20 2020 admin, FID 43, 10.75.27.192, portdisable 4/29

Diagnostics becomes more problematic when is just the events that show the links bounce but show no further information. Obtaining extended information from the HBA drivers may then be very helpful.

Update Drivers and Firmware

As you know I’m very picky when it comes to maintenance. If I see cases where System and/or Storage administrators have basically been slacking for a long time the chances are very high that I will tell you that and commence diagnosing issues as soon as these things are all up to date. You wouldn’t know the sheer amount of issues that have been resolved in firmware and drivers over any given time-period.

That being said going to the Linux side of the Emulex (or Broadcom) drivers for the LP31000/LP32000 cards which are very popular in many form-factors.

The driver will show as an lpfc module and is by default compiled into a ramfs image when installed. This will allow the card to be used in a boot-from-san variation if needed. The module will load as such and register with the scsi-subsystem

lpfc 978944 81
nvmet_fc 32768 1 lpfc
nvme_fc 45056 1 lpfc
scsi_transport_fc 69632 1 lpfc

With the most recent versions of the driver it will also provide an NVMe_oF initiator and target so that NVM equipment can be utilized when attached to a FC fabric.

Logging

Loggin with an Emulex card can be done on the driver level as well as the HBA firmware. Unless you get some instructions to do so leave the firmware logging as is. Mainly because changing these parameters will require a reload of the driver that basically instructs the firmware logging facility to capture data in some host memory region. Obviously that will involve some engineering efforts to diagnose anyway so that will not be very helpful to yourself or your OEM support-organisation unless it needs escalating to Emulex.

Changing the logging verbosity of the driver itself is much easier but may also incur some performance impact so don’t just flick on the “0xFFFFFFFF debug” button. The driver logging facility is a bitmap value based on the below table:

LOG Message Verbose Mask Definition Verbose Bit Verbose Description
LOG_ELS 0x00000001 ELS events
LOG_DISCOVERY 0x00000002 Link discovery events
LOG_MBOX 0x00000004 Mailbox events
LOG_INIT 0x00000008 Initialization events
LOG_LINK_EVENT 0x00000010 Link events
LOG_IP 0x00000020 IP traffic history
LOG_FCP 0x00000040 FCP traffic history
LOG_NODE 0x00000080 Node table events
LOG_TEMP 0x00000100 Temperature sensor events
LOG_BG 0x00000200 BlockGuard events
LOG_MISC 0x00000400 Miscellaneous events
LOG_SLI 0x00000800 SLI events
LOG_FCP_ERROR 0x00001000 Log errors, not underruns
LOG_LIBDFC 0x00002000 Libdfc events
LOG_VPORT 0x00004000 NPIV events
LOG_SECURITY 0x00008000 Security events
LOG_EVENT 0x00010000 CT,TEMP,DUMP, logging
LOG_FIP 0x00020000 FIP events
LOG_FCP_UNDER 0x00040000 FCP underruns errors
LOG_SCSI_CMD 0x00080000 ALL SCSI commands
LOG_NVME 0x00100000 NVME general events
LOG_NVME_DISC 0x00200000 NVME discovery/connect events
LOG_NVME_ABTS 0x00400000 NVME ABTS events
LOG_NVME_IOERR 0x00800000 NVME I/O Error events
LOG_EDIF 0x01000000 External DIF events
LOG_AUTH 0x02000000 Authentication events

If you don’t know what these mean, or have no clue on how to interpret the output, it’s not much use mucking around with these. The output will only confuse you and if you don’t know what the commands and responses should be it’s only a bunch of hex values.

The values as displayed above can be summed depending on which verbose logging needs to be enabled. For instance if your OEM asks you for Link events, ELS and Initialiation events you may get asked to enable verbose logging with either the “hbacmd” or via “sysfs”. The value of the parameter will than be “0x19”

hbacmd or sysfs

If you have hbacmd installed any change done in the logging preferences also automatically kicks of dracut and builds a new boot image. The command has a few additional parameters

hbacmd setdriverparam 10:00:00:90:fa:c7:cd:f9 G P log-verbose 0x135661

The first three a fairly obvious. Command setting driver parameters for PWWN 10:xxxxxx. The G stands for Global basically meaning it is valid for all adapters and the P stands for Permant. That ensures the parameter that follows is also applied after reboots. The log-verbose parameter is basically the configuration what we’re adjusting. The 0x135661 is a combination of values obtained via the table above.

The value can also dynamically be applied via sysfs in the “/sys/class/scsi_host/host<X>” (where <X> is the adapter ID) directory. The LPFC driver will create the system file as appropriate in that folder and one of which is indeed the “lpfc_log_verbose” file. The 0x<123456> value can be echoed to that file and the driver will dynamically pick this up.

[[email protected] host11]# cat lpfc_log_verbose 
0x0
[[email protected] host11]# echo 0x135661 > lpfc_log_verbose

The change is immediate logged

Oct 8 17:03:15 centos8 kernel: lpfc 0000:81:00.0: 0:(0):3053 lpfc_log_verbose changed from 0 (x0) to 1267297 (x135661)

When you change all of them with

[[email protected] scsi_host]# echo 0x135661 > host11/lpfc_log_verbose 
[[email protected] scsi_host]# echo 0x135661 > host12/lpfc_log_verbose 
[[email protected] scsi_host]# echo 0x135661 > host13/lpfc_log_verbose 
[[email protected] scsi_host]# echo 0x135661 > host14/lpfc_log_verbose

The messagelog will show something similar like this

Oct 8 17:28:28 centos8 kernel: lpfc 0000:81:00.0: 0:(0):3053 lpfc_log_verbose changed from -1 (xffffffff) to 1267297 (x135661)
Oct 8 17:28:50 centos8 kernel: lpfc 0000:81:00.1: 1:(0):3053 lpfc_log_verbose changed from 1267297 (x135661) to 1267297 (x135661)
Oct 8 17:28:58 centos8 kernel: lpfc 0000:83:00.0: 2:(0):3053 lpfc_log_verbose changed from 1267297 (x135661) to 1267297 (x135661)
Oct 8 17:29:04 centos8 kernel: lpfc 0000:83:00.1: 3:(0):3053 lpfc_log_verbose changed from 1267297 (x135661) to 1267297 (x135661)

The interesting past is that the paths to the adapter entries are used here. This is reflected in the “0000:81:00.0:”, 0000:83:00:0:” etc entries.

Remember that in normal circumstances you would not need to change these values. The basics are logged anyway and only in specific circumstances you would need to adjust that. Also be aware that using a debug value of 0xFFFFFFFF can incurr a significant performance overhead on busy systems as a lot needs to be logged.

Another thing that I get often queried about is which HBA port belongs to which SCSI number.

Identifcation of the respective HBA’s can be done by looking at the adapter entries int he eventlog as mentioned above. In this case the 81 and 83 values are a reflection of the PCI id and the 00:.0 and 00.1 are the individual ports on those adapters.

81:00.0 Fibre Channel: Emulex Corporation LPe31000/LPe32000 Series 16Gb/32Gb Fibre Channel Adapter (rev 01)
81:00.1 Fibre Channel: Emulex Corporation LPe31000/LPe32000 Series 16Gb/32Gb Fibre Channel Adapter (rev 01)
83:00.0 Fibre Channel: Emulex Corporation LPe31000/LPe32000 Series 16Gb/32Gb Fibre Channel Adapter (rev 01)
83:00.1 Fibre Channel: Emulex Corporation LPe31000/LPe32000 Series 16Gb/32Gb Fibre Channel Adapter (rev 01)

You can see these entries coming back in the /sys/class/fc_host directory where logical links to the PCI devices are created

lrwxrwxrwx. 1 root root 0 Sep 4 15:08 host11 -> ../../devices/pci0000:80/0000:80:03.0/0000:81:00.0/host11/fc_host/host11
lrwxrwxrwx. 1 root root 0 Sep 4 15:08 host12 -> ../../devices/pci0000:80/0000:80:03.0/0000:81:00.1/host12/fc_host/host12
lrwxrwxrwx. 1 root root 0 Sep 4 15:08 host13 -> ../../devices/pci0000:80/0000:80:03.2/0000:83:00.0/host13/fc_host/host13
lrwxrwxrwx. 1 root root 0 Sep 4 15:08 host14 -> ../../devices/pci0000:80/0000:80:03.2/0000:83:00.1/host14/fc_host/host14

As soon as you know this you can associate the respective WWN of the adapter to the one you see on the switch:

[[email protected] ~]# cat /sys/class/fc_host/host11/port_name
0x10000090fac7cde8
Sydney_ILAB_X6_43_TEST:FID43:admin> switchshow
switchName: Sydney_ILAB_X6_43_TEST
switchType: 165.0
<snip>

Index Slot Port Address Media Speed State Proto
============================================================
66 4 2 01ef40 id N8 Online FC F-Port 50:06:0e:80:10:13:b5:b8 
<snip>
93 4 29 010000 id 16G Online FC F-Port 10:00:00:90:fa:c7:cd:e8

The above shows you when you see errors happening as part of a SAN attached disk where to look and how to assocaite the Emulex adapters to the respective WWN’s on your SAN.

From there on you can also identify which disks are presented to that adapter. As you’ve seen above the PCI subsystem creates a host interface per FC port. In my case these are host11 to host14.

A simple way to check is to just do an “ls” on /sys/class/scsi_disk/device/block tree.

[[email protected] scsi_disk]# ls */device/block/
<snip>

'11:0:0:0/device/block/':
sdm

<snip>

'11:0:0:8/device/block/':
sdu

As you can see the 11:xxxxx entries will list the respetive “/dev/sd*” entries that is being used for mounting volumes, MPIO listings etc.

Obviously there are heaps of tools available to ease your troubleshooting efforts. I would advise to install the Emulex OCMananger tools that are provided as a free separate package. It can be installed as an agent and agent-less feature. Other tools like “lsblk”, “blockdev”, sg-tools package and a few more are there to make your life a bit easier so you don’t have to crawl thru the sysfs tree yourself.

Let me know if this was helpful. You feedback is much appreciated.

Regards,

Erwin

Print Friendly, PDF & Email

About Erwin van Londen

Master Technical Analyst at Hitachi Data Systems
General Info, Linux, Storage , , , ,