As you may have read in my previous posts I’m not really a fan of marketing driven terminology whereby existing technology is re-branded over and over again in order to obfuscate the underlying technology and make things more complex that they really are. The FC Gen-X nonsense is one of them. With Brocade Fabric Vision it took me a while however I see where Brocade is going with this and more where it is coming from.
Brocade has had many different technologies, features and functions in their operating systems FOS and NOS which were developed to address a particular issue. I’ve written articles around Fabric Watch before and mentioned I’ve been a fan of this feature since day 1. For me it has been almost compulsory for sales-teams to include this license in their offerings. The last couple of years additional features like bottleneck monitoring and frame monitoring have been included to be able to spot latency and congestion issues in addition to be able to track certain frame-types across individual links.
Now, remember that each of these features are targeted to either bring insight into a possible existing issue or monitors certain components and configurations for certain events. You had to commence on a serious journey to 1: learn and be able to identify what to look for, 2: how to configure the correct settings via either CLI, Webtools or BNA (of which the latter were restricted in functionality), 3: how to interpret the data and 4: how to come up with tangible actions to resolve the issue or plan changes in the infrastructure.
Let run through these bullet points and have a short look at what this might mean.
1. If you run into a problem where a host/application is observing IO errors you need to know where these come from. In a storage network this might be somewhere between the application itself and the spindle where the data actually resides. We need to take into account the application itself, the filesystem, the volume-manager, the SCSI driver, the FCP driver (the piece that maps the SCSI command reference onto fibre-channel), the HBA driver, HBA firmware, connectors, cables, switches (plus related configuration and firmware), storage configuration from a configuration and capabilities perspective in addition to any hardware related problem. Then take into account the entire environment is a shared one which basically means that any rogue application can have a very serious adverse impact on other parts of the infrastructure. This means you need to take a very methodological approach of troubleshooting in order to find any culprit.
2. Over various FOS versions Brocade has added numerous functions to be able to address the most common problems and be able to identify these. Commands like fwconfigure (for fabric watch), bottleneckmon (to addres latency and congestion issues in addition to credit-recoverability functions) fmmonitor (to check certain frametypes from or to certain source and destinations), thmonitor, portfencing, portthconfig etc etc. All these commands have over various FOS generations seen modifications in either functionality, and therefore a changed CLI parameter set, or have been depreciated. As an example the fabricwatch “fwconfigure” command has been depreciated with FOS 7 in favor of the non-interactive thconfig and portthconfig commands and fwshow has been declared obsolete as well. So taking this into account you really need to know what you’re doing here.
3. In case you’ve been able to figure out 1 and 2, challenge number 3 is to combine these and correlate the events, symptoms and diagnostics results into an flowcharts which, eventually should lead you to number 4.
4. Depending on how big of an impact your findings are to may need to do some very simple confi change which could take 5 minutes or you may at some stage you might be pulled into a major re-design project which keeps you awake for a couple of weekends.
The majority of the time though a single little or big issue is most of the time responsible for a particular problem and the challenge is to find it quickly in order to be able to resolve the problem itself plus prevent it from having further impact on the rest of the storage network.
Given the fact the troubleshooting time absorbs the lions-share of “time to repair” Brocade has now provided a consolidated package which they call Fabric Vision.
Fabric Vision came to life with FOS 7.2 and Brocade Network Advisor 12. and consists of two individual “frameworks”. Flow Vision and MAPS
Fabric Watch was replaced by MAPS (Monitoring and Alerting Policy Suite). MAPS allows you to set thresholds on various aspects of a fibre-channel fabric both from a physical perspective (like environmental issues) as well as from an operational side (like frame corruption, latency and congestion). It also can hook into Flow Vision to be able to alert on certain thresholds configured via pre-defined flows.
The most heard issue with Fabric Watch has always been that there wasn’t a set of pre-defined values available in a configuration set on the switches. With MAPS Brocade has changed this. MAPS has three different kind of value sets (Aggressive, Moderate and Conservative) each having different threshold values and associated actions. These values have been mapped based upon quite a few years of consulting and support in many demanding customer environments. This still doesn’t provide a guarantee that it will be appropriate for your environment however ti does give you a very good start using MAPS and combine this with the overall management capabilities of BNA.
Flow Vision contains three features that can either monitor, generate or mirror FC flows from a source to a destination even on certain selection criteria on each field of a FC frame or SCSI CDB. This allows you to hone in on very specific traffic patterns and provides the ability of determining issues which you normally might not be able to detect.
Flow Monitor section replaces individual features in the form of Top-Talkers, Frame Monitor and E-to-E-monitor. The monitor section obviously configures a monitoring set of values on certain ports. This policy can be anything as previously mentioned. Source/Destination, NPIV ports, SCSI Reads/Writes, SCSI Reserves, aborts etc etc. It also lets you configure flows in, so called, ‘Learning Mode”. This way it detects traffic flows between two F-ports and you’re able to determine various statistics on these flows. IOPS, throughput, framecount etc is all available on a very granular level.
Flow Generator gives you the ability to create traffic patterns between a source and destination. This allows you to check for any sort of link characteristics including performance.
Last but not least Flow Mirror. Flow Mirror lets you copy a frame-flow from a port to the CPU of the switch. This then give you the ability to analyze traffic patterns from certain hosts and applications. The Flow Mirror can capture the first 64 bytes of FC frame which includes the frame header, the SCSI CDB (Command Descriptor Block) and SCSI status messages. This provides an invaluable insight into the FC traffic patterns and creates a great addition to the troubleshooting tools already available in FOS.
Remember that both MAPS and Flow Vision are still a first cut of what can become an great asset in SAN operations, overall management and troubleshooting. Brocade do need to take care that these features do not wander off in various directions which inevitably will cause admins to lose track of this. The CLI is still very powerful and need to be retained but simplification may contribute into an early adoption rate of both MAPS and Fabric Vision among administrators.
I think that with each newer generation of ASIC the capabilities of these two technologies at some point in time might match the ones shown in JDSU Xgig FC analysis equipment but Brocade would need to ramp up the power of the Fabric Vision tools and ASICs significantly. Maybe in 3 to 4 years from now we will see products ship which show a great advance in the RAS technologies.
Keep up the good work Brocade.
Erwin van Londen