Why Fibre-Channel has to improve

Many of you have used and managed fibre-channel based storage networks over the years. It comes to no surprise that a network protocol primarily developed to handle extremely time-sensitive operations is build with extreme demand regarding hardware and software quality and clear guidelines on how these communications should proceed. It is due to this that fibre-channel has become the dominant protocol in datacenters for storage.

Major players like Brocade, Cisco, Emulex and Qlogic have delivered hardware, software and solutions over the past 2+ decades which provide a variety of options to manage these storage networks. Then take into account the vast ecosystem of storage providers like Hitachi Data Systems, EMC, HP, IBM, NetApp etc which leverage these solutions and provide an additional value so an end-to-end managed storage infrastructure can be designed and deployed based upon your business needs.

There are two major problems with Fibre-Channel. From the beginning, since it was such a niche market, the price has always been too high (IMHO). This has prevented adoption outside the major datacenters into branch offices. For me a FC port should have been as standard on a PC as an Ethernet port let alone on a server. As a consequence the second problem is that due to this high bar some vendors started to develop alternatives to close this gap and these rose in the form of iSCSI and Ethernet Storage. Most notably Dell (via Equalogic and Compellent) and Coraid started and were soon followed by start-ups like Lefthand (now HP) and many others.

From a technical standpoint the biggest issue with Fibre-Channel is that the level of transparency towards host operating systems and applications. On this level everything ” talks” regular SCSI and thus when the IO’s are mapped onto an FC4 layer(FCP) these hosts have no further information at their disposal to direct or influence this traffic behaviour. This also means that error recovery also sits on this level and that is most often seen when it is too late. IO and SCSI errors can lead to many nasty issues including performance problems and data corruption.

To accommodate for this lack of information I’m developing a new ELS frame which gives the operating systems and applications both insight into the SAN infrastructure as well as a statistical overview of reliability of that infrastructure. I’ve outlined this in my previous two posts (here and here)

If this is ratified by T11 the development of products and software can be fairly quick. The new proposal is very open and every vendor can use it.

Another reason why Fibre-Channel has to improve is due to the fact that the overly hyped Software Defined “Everything” is providing overlays onto storage-networks. At VMworld 2013 VMware announced their vSAN technology This may be great if you want a fairly large distributed storage network however applications that rely on high transaction oriented datatranfers with very low latency requirements do need a fair bit of power. If a data request from an application is distributed via the SDS overlay, you will see that a significant latency is introduced. In order to accommodate for this requirement the overlay needs deeper interaction into the storage infrastructure so is can make rational decisions on where to place data and how to retrieve it in the fasted possible way. The only way operating systems and hypervisors can currently do this is just by measuring response times and act accordingly. If however there are changes in the infrastructure the performance balance may flip completely and you’ ll see that the hypervisor orchestration software will try to re-balance all this. This may also introduce a negative side effect on the remaining infrastructure like networks etc.

I’ll keep you informed around the progress.

Cheers,

Erwin

EvL Consulting

Why Fibre-Channel has to improve

2 responses on “Why Fibre-Channel has to improve”