Fabric design, the good the bad and the ugly.

For numerous years entire bibles have been filled with storage design concepts, pro’s, cons, benefits, cost structures on port-counts vs. performance etc. however whenever I get to see a fabric overview of what is connected to what and how stuff goes back and forth between initiators and targets it always (well around 99% of the time) looks like this.

standard_fabricObviously this is a, so called, core-edge fabric with . I’ll ask you the following question: Why is this a bad design. On second thought, I’ll spare you making a list of flaws and turn the question around. Why is this a good design? There is only one answer and this is why:

Basically this design is only good for one thing: It looks great on a MS Visio sheet and nothing more. Most often the reasoning of providing such a design is often related to physical datacentre layout restrictions where the core switches are positioned in a somewhat central location and the edge switches often are TOR switches, whether in NPV/AG mode or not, or embedded blade switches which are then connected via ISL’s to the cores. The issue with this design is the multitude of physical touch-points between the initiator and target and vice versa. The matter of fact is that the majority of data-centres have a predefined layout with regards to rack isles and cabling infrastructure which limits the physical options of connecting servers, switches and storage. Although the above picture shows a pretty simple layout you’ll see that it already comprises of 16 physical connection points from HBA to array and back.  When you take into account the majority of issues (around 95%, I’ve described a lot of them here, here, here and here) are related to physical parts in the infrastructure you’ll see why you want to reduce the path of a FC frame as much as possible.

Now, the edge-core-edge design is not bad if the path-flow of the data is reduced to the local switch and even the local ASIC. Even if you have a single director class switch you might see errors popping up on the back-plane of the switch. The reality is that, although we think in digital terms of 0 and 1, the physical side is still very much analogue world. Electrical signals are still singing and dancing on copper circuitry and signal interference is still happening all over the place so keeping initiators close to targets is a good idea. The more you can reduce the distance and interconnects between source and destination the more you reduce the chance of corruption of frames, back-pressure of lack of buffer-credits (described here) and you’ll see that performance in general is improving.

Another thing is ISL connectivity. When planning connectivity you REALLY need to know the hardware layout of your switches. If you throughput requirement is that you need 4 ISL’s it is better to have them split on two different ASIC’s on the same blade then selecting the first 4 ports of the blade. If you do use these 4 ports then all ports on the other ASIC will first need to go to the core-blade and then come back to be able to traverse the ISL’s (Obviously this depends on your type of blade and if you’re using ICL’s or not.)

The second bad design is a-symmetric fabrics. While this might seem obvious Having a difference in number of switches and connected devices or a very unbalanced workload between fabrics is not only a burden to maintain but is also a huge strain on HA and DR planning. Applications might have a fairly predictable workload pattern however when you combine multiple of these you’ll see the amplification or variables have a huge impact on IO and throughput patterns. The cumulation of these factors will always increase with a factor of 2 or more depending on how these applications behave. To map and plan these on an a-symmetric storage infrastructure is undeniably a huge task if not impossible and the end-result is likely only valid for a few weeks (if not days) depending on change rates in provisioning, locality of systems etc. In DR scenario’s you most likely will end up with a very unpredictable behaviour of your applications. It’s very rare that DR environments are 100% equal to their production counterparts. Even when you have symmetric fabrics and storage it’s hard enough to plan for a DR let alone when the designs do differ from an architectural and physical perspective.

In short there are a couple of Rules of Thumb:

  1. Keep it simple. Make sure your layout is understandable both from a physical and logical perspective
  2. Keep the number of physical, and therefore logical, hops to an absolute minimum. This prevents end-to-end frameflow issues to a minimum.
  3. Try to achieve symmetry in your design to prevent an unbalanced, and therefore unpredictable, environment.
  4. In a greenfield situation re-think your cabling strategy to be able to achieve the above.

You probably can think of some more so any suggestions are welcome.

Regards,

Erwin

Print Friendly, PDF & Email

Subscribe to our newsletter to receive updates on products, services and general information around Linux, Storage and Cybersecurity.

The Cybersecurity option is an OPT-OUT selection due to the importance of the category. Modify your choice if needed.

Select list(s):