If you’ve read my articles over the last decade or so you’ve seen I’m keen on maintenance. Both from a physical hardware as well as software perspective a storage environment needs to be kept in tip-top shape at all times.
Unfortunately what we see in support is when businesses decide to outsource their operational part there are no clauses build into contracts for timed checkups on both of these. Unless there are tangible risks to business operations (most often driven by security teams) there is no incentive for outsourcing parties to do this because most often as a result of two factors.
- It costs time. This time is not catered for in contracts and as such any activity of this sort is on a time/material basis. As both cut into the bottom line of either the client or the outsourcing party it is often neglected.
- Risk. In too many cases the term “If it isn’t broken don’t fix it” is applies. True, in a few cases a software upgrade fails and one or more corrective actions need to be taken. This does not warrant the behaviour of not maintaining your environment.
We’ve seen over the years that lack of pro-active maintenance will at some time catch up with every company. Software and firmware incompatibilities between older and newer equipment become detrimental to proper storage operations. Distributed network databases become corrupted due to upgrade or change restrictions. Scalability limitations are imposed due to hardware or software restrictions etc.
Then there are bugs. When you download a few release notes from FOS (or any other network operating system for that mater) you will see that a large amount of bugs/defects are either fixed or non-disruptive workaround for existing issues are provided. If the versions installed on the equipment is out of date the majority of issues have already been fixed in newer versions.
Outages and Recovery
The number of cases that get opened with OEM’s or Brocade related to existing software defects already resolved in newer versions is growing exponentially. The issues observed are often so severe that current ongoing outages will elongate significantly as many more factors in a recovery scenario need to be taken into account. One resolution path may also require new outages as inter-dependencies need to be resolved and as such an ongoing snow-ball effect of outages and downtime will occur.
Mix & Match
Do not grow your storage network beyond the boundaries of capacity and interoperability. We’ve seen cases where brand new Gen 5 directors where plugged into a fabric where Brocade 24000 (yes 24000 from back in 2003) and this went horribly wrong to the extent a 4 days outage to large parts of the SAN infrastructure caused major havoc in business operations. The sheer discrepancy in hardware platforms and software versions resulted in unrepairable damage which could only be resolved by physically isolating large parts of the older infrastructure and almost start from scratch with the new parts.
No, there are none. Time, Money, Knowledge, Vacations, Personnel rotations etc etc etc. I’ve heard them all. There is no excuse of having firmware older than 1 year sitting on a business critical system. Time equals scheduling and project management, Money equates to budgeting, Knowledge is training and allowing staff to study for the ever ongoing developments in this industry, Vacations comes back to Time management and if excessive personnel rotations result in hampering maintaining your business it may be wise to have a look at your HR department, company policies or compensation package.
As of recently Brocade has adopted a new policy where support engagement will also depend on firmware versions that are installed on your systems. This does not mean that support will be denied however an up to date firmware regime will be required. That will in practice result in the fact that you are required to have up-to-date firmware installed or that the new firmware will be installed before engaging in subsequent action items to be analyzed and executed. The reason for that is it will result in preventing all of the issues I mentioned above. (Except your process, planning and personnel issues of course :-))
This will not only ensure you have a supported environment but also significantly reduce the risk of being exposed to software issues resulting in massive outages subsequently impacting your core business.
Target path versions
Brocade will release a document every 3 months or so (if newer code is released) indicating the target path selection for the respective platform generations. It is then up to the OEM’s to follow that guideline and use the versions for their own interoperability testing. The version that will be selected for the different switch platforms is depending on a variety of criteria. Number and criticality of defects, applicability on certain platforms, general support for the switch generation itself (EoL/EoS), backward and forward compatibility and a few more things. All in all it will give any administrator about 6 months to a year to upgrade to the latest supported versions with a fallback of another year or so. If you’re not on these versions be aware that from a priority perspective you may be put lower on the priority list.
Basically it is expected from administrators, customers, service providers, system integrators, etc. that if you want to be taken serious from a support perspective your should also take your own environment serious. Not maintaining it is a recipe for disaster and the time it takes to analyze issues, creating and executing recovery plans and post maintenance activities will cost you more than you bargained for.