Category Archives: Troubleshooting

Cross-fabric collateral damage

Ever since the dawn of time the storage administrators have been indoctrinated with redundancy. You have to have everything at least twice in order to maintain uptime and be able to achieve this to a level of around 99,999%. This is true in many occasions however there are exceptions when even dual fabrics (ie physically separated) share components like hosts, arrays or tapes.. If a physical issue in one fabric is impacting that shared component the performance and host IO may even impact totally unrelated equipment on another fabric and spin out of control.

Continue reading

Short stroking disk drives to improve performance

Reading a post from Hans DeLeenheer (VEEAM) which ramped up quite a bit including responses from Calvin Zito (HP), Alex McDonald (NetApp) and Nigel Poulton. The discussion started on a comment that XIO had “special” firmware which improved IO performance. Immediately the term “short-stroking” came up which leads to believe X-IO is cheap-skating on their technologies. I was under the same impression at first right until the moment I saw that Richard Lary is (more or less) the head of tech at X-IO together with Clark Lubbers and Bill Pagano who also come out of the same DEC stable. For those of you who don’t know Richie, he’s the one who ramped up Digital StorageWorks back in the late 70’s/early 80’s and also stood at the cradle of VAX-VMS. (Yeah yeah, I’m getting old, google it if you don’t know what I’m talking about.)

Continue reading

FCIP configuration, pitfalls and troubleshooting

FCIP has been around for quite a while. The fine engineers of CNT/McData/Brocade/Rhapsody/Vixel (you name them) saw early one that a method was needed to overcome the, back then,  distance limitation of around 10KM. This was not due to a limitation in the FC protocol itself but more due to the fact the hardware back in the early 2000’s was not up to scratch to push FC frames over longer distances. Another drawback is that FC by nature is not routable. (Not taking into account the FC-IFR which came later and was developed between 2004 and 2008). That, by definition, makes it difficult to be adopted into existing infrastructures where no native FC extenders or other equipment like DWDM/CWDM was available to bridge the distance between two native FC pieces of equipment.

Contrary to popular belief the FCIP protocol was not developed under the T11 ANSI (now INCITS) umbrella but it was actually the IETF who took on this task. The standard is published under RFC-3821.

Continue reading

Queue-depth

I recently was involved in a discussion around QD settings and vendor comparisons. It almost got to a point where the QD capabilities were inherently linked to the quality and capabilities of the arrays. Total and absolute nonsense of course. Let me make one thing clear “QUEUING IS BAD“. In the general sense that is. Nobody want to wait in line nor does an application.

Whenever an application is requesting data or is writing results of a certain process it will go down the I/O-stack. In the storage world there are multiple locations where such a data portion can get queued.When looking at the specifics from a host level the queue-depth is set on two levels.

(p.s. I use the terms device, port, array interchangeably but they all refer to the device receiving commands from hosts.)

Continue reading

Signal quality and link stability

I really think I should stop with fillword discussions but here is one more. What happens even if you have set the correct fillword, have made sure all hardware is in tip-top shape and still the encoding errors fly around like a swarm of hornets. Then the problem of ISI might be more problematic.

The main issue still is that the receiving side is unable to distinguish between a 0 and 1. The so called eye-pattern is too narrow or too distorted in such a way the receiver is just seeing gibberish.

Continue reading

Fillwords IDLE vs ARBff (one last time)

I’ve written about fillwords a lot (see here, here, and here) but I didn’t show you much about the different symptoms an incorrect fillword setting may incur.

As you’ve seen fillwords are a very nifty way of maintaining bit and word-sync on a serial transmission link when no actual frames are sent. Furthermore they also are replaceable with other primitive signals (Like R_RDY, VC_RDY etc) to utilize a very simple instruction method between two ports without interfering with actual frames. That means that fillwords are ALWAYS squeezed in between frames.

SQUEEZE

Continue reading

Speed mismatch is the death-trap for shared storage

I’ve been focusing on the implications of physical issues a lot in my posts over the last ~2 years. What I haven’t touched on is logical performance boundaries which also cause extreme grief in many storage infrastructures which lead to performance problems, IO errors, data-corruption  and other nasty stuff you do not want to see in your storage network.

Speedometer

 

Continue reading

Performance expectations with ISL compression

So this week I had an interesting case. As you know the Hitachi arrays have a replication functionality called HUR (Hitachi Universal Replicator) which is an advanced a-synchronous replication solution offered for Mainframe and OpenSystems environments. HUR does not use a primary to secondary push method but rather the target system is issuing reads to the primary array after which this one sends the required data. This optimizes the traffic pattern by using batches of data. From a connectivity perspective you will mostly see full 2K FC frames which means on long distance connections (30KM to 100KM) you can very effectively keep a fairly low number of buffer-credits on both sides whilst still maintain optimum link utilization.

Continue reading