Will FCoE bring you more headaches?

Yes it will!!.

Bit of a blunt statement but here’s why.

When you look at the presentations all connectivity vendors (Brocade,Cisco,Emulex etc…) will give you they pitch that FCoE is the best thing since sliced bread. Reduction in costs, cooling, cabling and complexity will solve all of your to-days problems! But is this really true?

Let start with costs. Are the cost savings really that big as they promise. These days a server 1G Ethernet port sits on the motherboard and is more or less almost a free-bee. Expectation is that the additional cost of 10Ge will be added to a server COG but as usual they will decline over time. Most servers come with multiple of these ports. On average a CNA is 2 times more expensive then 2 GE ports + 2 HBA’s so that’s not a reason to jump to FCoE. Each vendor have different price lists so that’s something you need to figure out yourself. The CAPEX is the easy part.

An FCoE capable switch (CEE or FCF) is significantly more expensive than an Ethernet switch + a FC switch. Be aware that these are data center switches and the current port count on an FCoE switch is not sufficient to deploy large scale infrastructures.

Then there is the so called power and cooling benefit. (?!?!?) I searched my butt of to find the power requirements on HBA’s and CNA’s but no vendor is publishing these. I can’t imagine an FC HBA chip eats more than 5 watts however a CNA will probably use more given the fact it runs on a higher clock speed and for redundancy reasons you need two of them anyway so in general I think these will equate to the same power requirements or an eth+hba combination is even more efficient than CNA’s. Now lets compare a Brocade 5000 (32 port FC switch) with a Brocade 8000 FCoE from a BTU and power rating perspective. I used their own specs according to their data sheets so if I made a mistake don’t blame me.

A Brocade 5000 uses a maximum of 56 watts and has a BTU rating of 239 at 80% efficiency. An 8000 FCoE switch uses 206 watts when idle and 306 watts when in use. The BTU heat dissipation is 1044.11 per hour. I struggled to find any benefit here. Now you can say that you also need an Ethernet switch but even if that has the same ratings as a 5000 switch you still save a hell of a lot of power and cooling requirement on separate switches. I haven’t checked out the Cisco, Emulex and Qlogic equipment but I assume I’m not far off on those as well.

Now, hang on, all vendors say there is a “huge benefit” in FCoE based infrastructures. Yes, there is, you can reduce your cabling plant but even there is a snag. You need very high quality cables so an OM1 or OM2 cabling plant will not do. As a minimum you need OM3 but OM4 is preferred. Do you have this already? If so good you need less cabling, if not buy a completely new plant.

Then there is complexity. Also an FCoE sales pitch. “Everything is much easier and simpler to configure if you go with FCoE”. Is it??? Where is the reduction in complexity when the only benefit is that you can get rid of cabling. Once a cabling plant is in place you only need to administer the changes and there is some extremely good and free software to do that. So even if you consider this as a huge benefit what do you get in return. A famous Dutch football player once said “Elk voordeel heb z’n nadeel” (That’s Dutch with an Amsterdam dialect spelling :-)) which more or less means that every benefit has it’s disadvantage i.e. there is a snag with each benefit.

The snag here is you get all the nice features like CEE,DCBX,LLDP,ETS,PFC,FIP,FPMA and a lot more new terminology introduced into you storage and network environment. (say what???). This more or less means that each of these abbreviations needs to be learned by your storage administrators as well as you network administrators, which means additional training requirements (and associated costs). This is not a replacement for your current training and knowledge but this comes on top of that.
Also these settings are not a one-time-setup which can be configured centrally on a switch but they need to be configured and managed per interface.

In my previous article I also mentioned the complete organizational overhaul you need to do between the storage and networking department. From a technology standpoint these two “cultures” have a different mindset. Storage people need to know exactly what is going to hit their arrays from an applications perspective as well as operating systems, firmware, drivers etc. Network people don’t care. They have a horizontal view and they transport IP packets from A to B irrespective of the content of that packet. If the pipe from A to B is not big enough they create a bigger pipe and there we go. In the storage world it doesn’t work like this as described before.

Then there is the support side of the fence. Lets assume you’ve adopted FCoE in your environment. Do you have everything in place to solve a problem when it occurs. (mind the term “when” not “if”) Do you know exactly what it takes to troubleshoot a problem. Do you know how to collect logs the correct way? Have you ever seen a Fibre Channel trace captured by an analyzer? If so, where you able to bake some cake of it and actually are able to pinpoint an issue if there is one and more importantly how to solve this? Did you ever look at fabric/switch/port statistics on a switch to verify if something is wrong? For SNIA I wrote a tutorial (over here) in which I describe the overall issues support organisations face when a customer calls in for support and also what to do about it. The thing is that network and storage environments are very complex. By combining them and adding all the 3 and 4 letter acronyms mentioned above the complexity will increase 5-fold if not more. It therefore takes much and much longer to be able to pin-point an issue and advise on how to solve it.

I work in one of those support centers of a particular vendor and I see FC problems every day. Very often due to administrator errors but far more because of a problem with software or hardware. These can be very obvious like a cable problem but in most cases the issue is not so clear and it take a lot of skills, knowledge, technical information AND TIME to be able to sort this out. By adding complexity it just takes more time to collect and analyze the information and advise on resolution paths. I’m not saying it becomes undo-able but it just takes more time. Are you prepared and are you willing to provide your vendor this time to sort out issues?

Now, you probably think I must hold a major grudge against FCoE. On the contrary; I think FCoE is a great technology but it’s been created for technologie’s sake and not to help you as customer and administrator to really solve a problem. The entire storage industry is stacking protocols upon protocols to circumvent the very hard issue that they’ve screwed up a long time ago. (Huhhhh, why’s that?)

Be reminded that today’s storage infrastructure is still running on a 3 decade old protocol called SCSI (or SBCCS for z/OS which is even older). Nothing wrong with that but it implies that shortcomings of this protocol needs to be circumvented. SCSI originally ran on a parallel bus which was 8-bit wide and hit performance limitations pretty quick. So they created “wide scsi” which ran on a 16-bit wide bus. With increase of the clock frequencies they pumped up the speed however the problem of distance limitations became more imminent and so they invented Fibre-Channel. By disassociating the SCSI command set from the physical layer the T10 committee came up with SCSI-3 which allowed the SCSI protocol to be transported over a serialized interface like FC which had a multitude of benefits like speed, distance and connectivity. The same thing happened with Escon in the mainframe world. Both the Escon command set (SBCCS now known as Ficon) as well as SCSI (on FC known as FCP) are now able to run on the FC-4 layer. Since Ethernet back then was extremely lossy this was no option for a strict lossless channel protocol with low latency requirements. Now that they have fixed up Ethernet a bit to allow for loss-less transport over a relatively fast interface they now map the entire stack into a mini-jumbo frame and the FCP-4 SCSI command and data sits in a FC encapsulated frame which in turn now sits in an Ethernet frame. (I still can’t find the reduction in complexity, if you can please let me know.)

What should have been done instead of introducing a fixer-upper like FCoE is that the industry should have come up with an entirely new concept of managing, transporting and storing data. This should have been created based on todays requirements which include security (like authentication and authorization), retention, (de-)duplication , removal of awareness of locality etc. Your data should reside in a container which is a unique entity on all levels from application to the storage and every mechanism in between. This container should be treated as per policy requirements encapsulated in that container and those policies are based on the content residing in there. This then allows for a multitude of properties to be applied to this container as described above and allows for far more effective transport

Now this may sound like trying to boil the ocean but try to think 10 years ahead. What will be beyond FCoE? Are we creating FCoEoXYZ? 5 Years ago I wrote a little piece called “The Future of Storage” which more or less introduced this concept. Since then nothing has happened in the industry to really solve the data growth issue. Instead the industry is stacking patch upon patch to circumvent current limitations (if any) or trying to generate a new revenue stream with something like the introduction of FCoE.

Again, I don’t hold anything against FCoE from a technology perspective and I respect and admire Silvano Gai and the others at T11 what they’ve accomplished in little over three years but I think it’s a major step in the wrong direction. It had the wrong starting point and it tries to answer a question without anyone asking.

For all the above reasons I still do not advise to adopt FCoE and urge you to push your vendors and their engineering teams to come up with something that will really help you to run your business and not patching up “issues” you might not even have.

Constructive comments are welcome.

Kind regards,
Erwin van Londen