Tag Archives: hba

A less known "host-can’t-see-storage" problem caused by FC-SP authentication failures.

Many support-cases open with the line “Host can’t see storage” which puts most of these cases in the “configuration” queue. My assumption is that if a host can’t “see” storage it has never worked before so there must be some kind of config problem.

So once in a while you’ll see a case popping up where a less used feature of the FC protocol is used and that is FC Authentication (part of the FC-SP protocol.) This piece of the FC stack allows for an authentication mechanism between two ports on a link or an end-to-end connection.
The FC-SP protocol allows for a security check to be set up to prevent unauthorized access to fabric services. Basically it means if you don’t know my authentication method and access passwords I do not trust you and I will not allow you access to the fabric.
The fabric security architecture is build upon 5 components namely:
  • Authentication Infrastructure
  • Authentication
  • Security Associations
  • Cryptographic Integrity and Confidentiality
  • and last but not least Authorization
I will not dive into each and every one of them since thats beyond the scope of this post. The Authentication part is fairly wide adopted by all HBA and switch vendors whilst the authorization and crypto services are only seen in very security environments.
Many initiators (mainly HBA’s) and targets (mainly storage devices) support the link authentication option but I haven’t come across an HBA vendor which also support the end-to-end authentication option.
The difference is depicted below.
The individual (green) N_port-to-N-port are obviously the ports participating in the individual link authentication whilst the end-to-end (red) ports have an end-to-end authentication configured (These are not mutually exclusive and can be configured individually (if supported of course))
“So what does this have to do with my host not seeing storage?” you might ask.
Well, very simple. If you have configured anything in this entire setup incorrectly each port may refuse access and therefore you will not see any targets or luns.
It’s fairly easy to make a mistake in this configuration so lets have a look on the wire to see what the switch and HBA do when this option is turned on.
As an example I use an Emulex HBA but the HBA’s from Qlogic, Brocade etc have a similar setup.
When an HBA tries to obtain access to a fabric it first sends out a FLOGI (Fabric login). In this FLOGI it tells the fabric it’s id (WWN),which capabilities it has and what services plus classes it wants to use.
On of the parameters of these services is the “Security services” which is identified by a single bit in the  “Common Services” parameter in this FLOGI.
The switch in turn checks for this bit and either returns with an ELS accept or, in case the authentication is not configured on the switch,  an LS reject.
In the picture below you see the accept. This doesn’t yet mean the ports have authenticated each other but merely have let each other know that they do want to start the authentication process.
Given the fact the FC-SP authentication infrastructure supports 3 methods of authentication (DH-CHAP, FCPAP and FCAP) we then need to establish the one we want to use. This is done with a ELS command coded 0x90
As you can see the HBA only can use 1 protocol and it wants to authenticate via DH-CHAP, it can use MD5 and SHA1 for hash marks and it can use 5 DH groups.
The accept from the switch is pretty straightforward:
As of this point the stage is set for the actual authentication. The ports have shown the do support the FC-SP protocol, they have set the DC-DHAP parameters which they are going to use so now they only thing that’s left is the actual exchange of shared secrets.
One of the interesting things is that as of this point the class of service changes to Class 2. As you can see in the trace the “Start-of-Frame” now initiates a class 2 exchange.
Given the fact class 2 means that we now require acknowledgement of frames we also see ACK_1 being returned for each frame being sent. Here the ACK_1 (C0) is sent from the switch to the HBA for sequence id 0x-08 of exchange id 0x50BD:
The DH-Challenge frame is then sent with the options the switch wants to use:
The command code is 0x90, with a 0x10 AUTH message code meaning I want to use DH-CHAP. The DHCHAP payload contains the parameters used in the challenge basically meaning I am WWN 10:00:00:00:05:1E:52:AF:00 , I use the MD5 hash in DH-group NULL and a value length of 16 bytes containing the following value.
This is still a Class-2 frame so an ACK_1 for acknowledgement is sent back prior to either an ELS Accept or reject.
If the HBA determines that these parameters are correct it will send an Accept back.
We are now at a point where the HBA trusts the switch but the switch has no clue he authenticated against the correct HBA so a DHCHAP reply from the HBA to the switch also has to be sent.
(What I did forget to highlight is that from a switch perspective it is the F-Port controller with Well-Known-Address 0xfffffe that takes care of the authentication and not the F-port itself. Just an FYI)
If the switch now determines these parameters are also correct an Accept is sent back and we have an authenticated session between these two ports.
The switch now sends a DHCHAP Success frame back which confirms the authentication status which is then acknowledge back by the HBA after which the rest of the usual FC handshaking takes place. (PLOGI,PRLI etc…)
The support problem.
Now the problem is that when we from support run into a “Host-cannot-see Storage” situation and the usual suspects like zoning, lun-masking have been excluded it will become very hard to check on situation like these since. Especially from a Brocade perspective it doesn’t provide much info besides the error messages which are not very useful beyond the fact it tells you that this error occurs and when it happens.
The second problem is that if this authentication is not used on the HBA nor the switch  but the target requires this then there is no indication at all.
As an example the Hitachi arrays do support both link-level as well as end-to-end authentication.
The first one is set on the individual physical ports and the second one on the Host-Storage domains in the Authentication utility via Storage-Navigator. If you have an HBA with drivers which support this end-to-end authentication you can set this up but if these HBA drivers do not support this the security bit in the PLOGI (which is used to login to the array) is set to 0 the array will just ignore this frame and send a reject back.
From a troubleshooting perspective it is fairly hard to diagnose that because the switch logs do not show initiator to target frames. We would need to set up separate frame monitors and/or enable debug flags This is a rather cumbersome way of doing thing so my suggestion is to check if both link-level and end-to-end authentication is supported and if these are properly configured.
As a tip on Emulex you can check in OC Manager if your HBA and driver supports this. If you can switch between “Destination: Fabric (Switch)” and “Destination: Target” the driver does supports end-to-end. If not then only link level authentication is supported.
The option to enable FC-Authentication (and hence turn on the bit 21 in the common services parameter in the FLOGI) is in the “Driver Parameters” tab:
The HBA needs to be reset after configuring DH-CHAP to trigger a new FLOGI.
To all HBA/CNA/Switch/Array/Tape/(and anything else you can connect to a FC port) firmware engineers I would like to ask to provide simple on/off flags/commands on ports to either enable/disable full frame capture. The first word of a payload is obviously not sufficient to troubleshoot these kind of issues.
Hope this is of any help.
Cheers,
Erwin van Londen

One rotten apple spoils the bunch – 1

Last week I had another one. A rotten apple that spoiled the bunch or, in storage terms, a slow drain device causing havoc in a fabric.

This time it was a blade-center server with a dubious HBA connection to the blade-center switch which caused link errors and thus corrupt frames, encoding errors and credit depletion. This, being a blade connected to a blade-switch, also propagated the credit depletion back into the overall SAN fabric and thus the entire fabric suffered significantly from this single problem device.

“Now how does this work” you’ll say. Well, it has everything to do with the flow-control methodology used in FC fabrics. In contrast to the Ethernet and TCP/IP world we, the storage guys, expect a device to behave correctly, as gentleman usually do. That being said, as with everything in life, there are always moment in time when nasty things happen and in the case of the “rotten apple” one storage device being an HBA, tape drive, or storage array may be doing nasty things.

Let’s take a look how this normally should work.

FC devices run on a buffer-to-buffer credit model. This means the device reserves an certain amount of buffers on the FC port itself. This amount of buffers is then communicated to the remote device as credits. So basically devices a gives the remote device permission to use X amount of credits. Each credit is around 2112 bytes (A full 2K data payload plus frame header and footer)

The number of credits each device can handle are “negotiated” during fabric login (FLOGI). On the left a snippet from a FLOGI frame were you see the number of credits in hex.

So what happens after the FLOGI. As an example we use a connection that has negotiated 8 credits either way. If the HBA sends a frame (eg. a SCSI read request) it knows it only has 7 credits left. As soon as the switch port receives the frame it has to make a decision where to send this frame to. It does this based on routing tables, zoning configuration and some other rules, and if everything is correct it will route the frame to the next destination. Meanwhile it simultaneously sends back a, so called, R_RDY primitive. This R_RDY tells the HBA that it can increase the credit counter back by one. So if the current credit counter was 5 it can now bump it back up to 6. (A “primitive” lives only between two directly connected ports and as such it will never traverse a switch or router. A frame can, and will, be switched/routed over one or more links)

Below is a very simplistic overview of two ports on a FC link. On the left we have an HBA and on the right we have a switch port. The blue lines represent the data frames and the red lines the R_RDY primitives.

As I said, it’s pretty simplistic. In theory the HBA on the left could send up to 8 frames before it has to wait for an R_RDY to be returned.

So far all looks good but what if the path from the switch back to the device is broken? Either due to a crack in the cable, unclean connectors, broken lasers etc. The first problem we often see is that bits get flipped on a link which in turn causes encoding errors. FC up to 8G uses a 8b10b encoding decoding mechanism. According to this algorithm the normal 8 data bits are converted to a, so called, 10-bit word or transmission character. These 10 bits are the actual ones that travel over the wire. The remote side uses this same algorithm to revert the 10-bits back into the original 8 data bits. This assures bit level integrity and DC balance on a link. However when a link has a problem as described above, chances are that one or more of these 10-bits flip from a 0 to 1 or vice-versa. The recipient detects this problem however since it is unaware of which bit got corrupted it will discard the entire transmission character. This means that if such a corruption is detected it will discard en entire primitive, or, if the corrupted piece was part of a data frame, this entire frame will be dropped.

A primitive (including the R_RDY) consists of 4 words. (4 * 10 bits). The first word is always a control character (K28.5) and it is followed by three data words (Dxx.x). 

0011111010 1010100010 0101010101 0101010101 (-K28.5 +D21.4  D10.2  D10.2 )

I will not go further into this since its beyond the scope of the article.

So if this R_RDY is discarded the HBA does not know that the switch port has indeed free-ed up the buffer and still think it can only send N-1 frames. The below depicts such a scenario:

As you can see when an R_RDY is lost at some point in time it will become 0 meaning the HBA is unable to send any frames. When this happens an error recovery mechanism kicks in which basically resets the link, clearing all buffers on both side of that link and start from scratch. The upper layers of the FC protocol stack (SCSI-FCP, IPFC etc) have to make sure that any outstanding frame have either to be re-transmitted or the entire IO needs to be aborted in which case this IO in it’s entirety needs to be re-executed. As you can see this will cause a problem on this link since a lot of things are going on except actually making sure your data frames are transmitted. If you think this will not have such an impact be aware that the above sequence might run in less than one tenth of a second and thus the credit depletion can be reached within less than a second. So how does this influence the rest of the fabric since this all seems to be pretty confined within the space of this particular link.

Let broaden the scope a bit from an architectural perspective. Below you see a relatively simple, though architecturally often implemented, core-edge fabric.

Each HBA stands for one server (Green, Blue,Red and Orange), each mapped to a port on a storage array.
Now lets say server Red is a slow drain device or has a problem with its direct link to the switch. It is very intermittently returning credits due to the above explained encoding errors or it is very slow in returning credits due to a driver/firmware timing issue. The HBA sends a read request for an IO of 64K data. This means that 32 data frames (normally FC uses a 2K frame size) will be sent back from the array to the Red server. Meanwhile the other 3 servers and the two storage arrays are also sending and receiving data. If the number of credits negotiated between the HBA’s and the servers is 8 you can see that after the first 16K of that 64K request will be send to Red server however the remaining 48K still is either in transit from the array to the HBA or it is still in some outbound queue in the array. Since the edge switch (on the left) is unable to send frames to the Red server the remaining data frames (from ALL SERVERS) will stack up on the incoming ISL port (bright red). This in turn causes the outbound ISL port on the core switch (the one on the right) to deplete its credits which means that at some point in time no frames are able to traverse the ISL therefore causing most traffic to come to a standstill.

You’ll probably ask “So how do we recover from this?”. Well, basically the port on the edge switch to the Red server will send a LR (Link Reset) after the agreed “hold-time”. The hold time is a calculated period in which the switch will hold frames in its buffers. In most fabrics this is 500ms. So if the switch has had zero credit available during the entire hold period and it has had at least 1 frame in its output buffer it will send a LR to the HBA. This causes both the switch and HBA buffer to clear and the number of credits will return to the value that was negotiated during FLOGI.

If you don’t fix the underlying problem this process will go on forever and, as you’ve seen, will severely impact your entire storage environment.

“OK, so the problem is clear, how do I fix it?”

There are two ways to tackle the problem, the good and the bad way.

The good way is to monitor and manage your fabrics and link for such a behavior. If you see any error counter increasing verify all connections, cables, sfp’s, patch-panels and other hardware sitting in between the two devices. Clean connectors, replace cables and make sure these hardware problems do not re-surface again. My advice is if you see any link behaving like this DISABLE IT IMMEDIATELY !!!! No questions asked.

The bad way is to stick your head in the sand and hope for it go away. I’ve seen many of such issues crippling entire fabrics and due strictly enforced change control severe outages occurred and elongated recovery (very often multiple days) was needed to get things back to normal again. Make sure you implement emergency procedures which allow you to bypass these operational guidelines. It will save you a lot of problems.

Regards,
Erwin van Londen

Maintenance

Why do I keep wondering why companies don’t maintain their infrastructure. It looks to be more of an exception than a rule to come along software and firmware which is newer than 6 months old. True, I admit, it’s a beefy task to keep this all lined up but then again you know this in advance so why isn’t there any plan to get this sorted every 6 months or even more frequent.


In my day-to-day job I see a lot of information from many customers around the world. Sometimes during implementation phases there is a lot of focus on interoperability and software certification stacks. Does Server A with HBA B work with switch Y on storage platform Z? This is very often a rubber stamping methodology which goes out the door with the project team. The moment a project has been done and dusted this very important piece is very often neglected due to many reasons. Most of them are time constraints, risk aversion, change freezes, fear something goes wrong etc. However when you look at the risk businesses take by taking chances not properly maintaining their software is like walking on a tightrope over the Grand Canyon with wind-gusts over 10 Beaufort. You might get a long way but sooner or later you will fall.

Vendors do actually fix things although some people might think otherwise. Remember in a storage array are around 800.000 pieces of hardware and a couple of million lines of software which make this thing run and store data for you. If you compare that to a car and would run the same maintenance schedule you’re requiring the car to run for 120 years non-stop without changing oil, filters, tyres, exhaust etc.etc. So would you trust your children in such a car after 2 years or even after 6 months. I don’t, but still businesses do seem to take the chances.

So is it fair for vendors to ask (or demand) you to maintain your storage environment. I think it is. Back in the days when I had my feet wet in the data-centers (figuratively speaking that is) once I managed a storage environment of around 250 servers, 18 FC switches and 12 arrays so a pretty beefy environment in those days. I’d set myself a threshold for firmware and drivers not to exceed a 4 month lifetime. That meant that if newer code came out from a particular vendor it was impossible that code was not implemented before those 4 months were over.
I spent around two days every quarter to generate the paperwork with change requests, vendor engineers appointments etc and 2 two days to get it all implemented. Voilà done.The more experience you become in this process the better and faster it will be done.

Another problem is with storage service providers, or service providers in general. They also are depending on their customers to get all the approvals stamped, signed and delivered which is very often seen a huge burden so they just let it go and eat the risk of something going wrong. The problem is that during RFP/RFI processes the customers do not ask for this, the sales people are not interested since this might become a show-stopper for them and as such nothing around the ongoing infrastructure maintenance is documented or contractually written in delivery or sales documents.

As a storage or service provider I would turn this “obligation” in my advantage and say: Dear Mr, customer, this is how we maintain our infrastructure environment so we, and you, are assured of immediate and accurate vendor support, have up-to-date infrastructure to minimize the risk of anything going downhill at some point in time.”

I’ve seen it happen were high severity issues with massive outages were logged with vendors where those vendors came back and say “Oh yes Sir, we know of this problem and we have fixed that a year ago but you haven’t implemented this level of firmware, it is far outdated”.

How would that look if you’re a storage/service provider to your customers or a bank who’s online banking system had some coverage in the newspapers??

Anyway, the moral is “Keep it working by maintaining your environment”.

BTW, at SNWUSA Spring 2011 I wrote a SNIA tutorial “Get Ready for Support” which outlines some of the steps you need to take before contacting your vendors’ support organisation. Just log on to www.snia.org/tutorials. It will be posted there after the event.

Cheers,
Erwin van Londen