A less known "host-can’t-see-storage" problem caused by FC-SP authentication failures.

Many support-cases open with the line "Host can't see storage" which puts most of these cases in the "configuration" queue. My assumption is that if a host can't "see" storage it has never worked before so there must be some kind of config problem.

So once in a while you'll see a case popping up where a less used feature of the FC protocol is used and that is FC Authentication (part of the FC-SP protocol.) This piece of the FC stack allows for an authentication mechanism between two ports on a link or an end-to-end connection.
The FC-SP protocol allows for a security check to be set up to prevent unauthorized access to fabric services. Basically it means if you don't know my authentication method and access passwords I do not trust you and I will not allow you access to the fabric.
The fabric security architecture is build upon 5 components namely:
  • Authentication Infrastructure
  • Authentication
  • Security Associations
  • Cryptographic Integrity and Confidentiality
  • and last but not least Authorization
I will not dive into each and every one of them since thats beyond the scope of this post. The Authentication part is fairly wide adopted by all HBA and switch vendors whilst the authorization and crypto services are only seen in very security environments.
Many initiators (mainly HBA's) and targets (mainly storage devices) support the link authentication option but I haven't come across an HBA vendor which also support the end-to-end authentication option.
The difference is depicted below.
The individual (green) N_port-to-N-port are obviously the ports participating in the individual link authentication whilst the end-to-end (red) ports have an end-to-end authentication configured (These are not mutually exclusive and can be configured individually (if supported of course))
"So what does this have to do with my host not seeing storage?" you might ask.
Well, very simple. If you have configured anything in this entire setup incorrectly each port may refuse access and therefore you will not see any targets or luns.
It's fairly easy to make a mistake in this configuration so lets have a look on the wire to see what the switch and HBA do when this option is turned on.
As an example I use an Emulex HBA but the HBA's from Qlogic, Brocade etc have a similar setup.
When an HBA tries to obtain access to a fabric it first sends out a FLOGI (Fabric login). In this FLOGI it tells the fabric it's id (WWN),which capabilities it has and what services plus classes it wants to use.
On of the parameters of these services is the "Security services" which is identified by a single bit in the  "Common Services" parameter in this FLOGI.
The switch in turn checks for this bit and either returns with an ELS accept or, in case the authentication is not configured on the switch,  an LS reject.
In the picture below you see the accept. This doesn't yet mean the ports have authenticated each other but merely have let each other know that they do want to start the authentication process.
Given the fact the FC-SP authentication infrastructure supports 3 methods of authentication (DH-CHAP, FCPAP and FCAP) we then need to establish the one we want to use. This is done with a ELS command coded 0x90
As you can see the HBA only can use 1 protocol and it wants to authenticate via DH-CHAP, it can use MD5 and SHA1 for hash marks and it can use 5 DH groups.
The accept from the switch is pretty straightforward:
As of this point the stage is set for the actual authentication. The ports have shown the do support the FC-SP protocol, they have set the DC-DHAP parameters which they are going to use so now they only thing that's left is the actual exchange of shared secrets.
One of the interesting things is that as of this point the class of service changes to Class 2. As you can see in the trace the "Start-of-Frame" now initiates a class 2 exchange.
Given the fact class 2 means that we now require acknowledgement of frames we also see ACK_1 being returned for each frame being sent. Here the ACK_1 (C0) is sent from the switch to the HBA for sequence id 0x-08 of exchange id 0x50BD:
The DH-Challenge frame is then sent with the options the switch wants to use:
The command code is 0x90, with a 0x10 AUTH message code meaning I want to use DH-CHAP. The DHCHAP payload contains the parameters used in the challenge basically meaning I am WWN 10:00:00:00:05:1E:52:AF:00 , I use the MD5 hash in DH-group NULL and a value length of 16 bytes containing the following value.
This is still a Class-2 frame so an ACK_1 for acknowledgement is sent back prior to either an ELS Accept or reject.
If the HBA determines that these parameters are correct it will send an Accept back.
We are now at a point where the HBA trusts the switch but the switch has no clue he authenticated against the correct HBA so a DHCHAP reply from the HBA to the switch also has to be sent.
(What I did forget to highlight is that from a switch perspective it is the F-Port controller with Well-Known-Address 0xfffffe that takes care of the authentication and not the F-port itself. Just an FYI)
If the switch now determines these parameters are also correct an Accept is sent back and we have an authenticated session between these two ports.
The switch now sends a DHCHAP Success frame back which confirms the authentication status which is then acknowledge back by the HBA after which the rest of the usual FC handshaking takes place. (PLOGI,PRLI etc...)
The support problem.
Now the problem is that when we from support run into a "Host-cannot-see Storage" situation and the usual suspects like zoning, lun-masking have been excluded it will become very hard to check on situation like these since. Especially from a Brocade perspective it doesn't provide much info besides the error messages which are not very useful beyond the fact it tells you that this error occurs and when it happens.
The second problem is that if this authentication is not used on the HBA nor the switch  but the target requires this then there is no indication at all.
As an example the Hitachi arrays do support both link-level as well as end-to-end authentication.
The first one is set on the individual physical ports and the second one on the Host-Storage domains in the Authentication utility via Storage-Navigator. If you have an HBA with drivers which support this end-to-end authentication you can set this up but if these HBA drivers do not support this the security bit in the PLOGI (which is used to login to the array) is set to 0 the array will just ignore this frame and send a reject back.
From a troubleshooting perspective it is fairly hard to diagnose that because the switch logs do not show initiator to target frames. We would need to set up separate frame monitors and/or enable debug flags This is a rather cumbersome way of doing thing so my suggestion is to check if both link-level and end-to-end authentication is supported and if these are properly configured.
As a tip on Emulex you can check in OC Manager if your HBA and driver supports this. If you can switch between "Destination: Fabric (Switch)" and "Destination: Target" the driver does supports end-to-end. If not then only link level authentication is supported.
The option to enable FC-Authentication (and hence turn on the bit 21 in the common services parameter in the FLOGI) is in the "Driver Parameters" tab:
The HBA needs to be reset after configuring DH-CHAP to trigger a new FLOGI.
To all HBA/CNA/Switch/Array/Tape/(and anything else you can connect to a FC port) firmware engineers I would like to ask to provide simple on/off flags/commands on ports to either enable/disable full frame capture. The first word of a payload is obviously not sufficient to troubleshoot these kind of issues.
Hope this is of any help.
Cheers,
Erwin van Londen
Print Friendly, PDF & Email

About Erwin van Londen

Master Technical Analyst at Hitachi Data Systems
Config Guide, Fibre Channel, Troubleshooting , , , ,