Brocade MAPS “non-management”

Ok this may seem a bit of me having a go at Brocade’s successor to Fabric Watch but it isn’t. This week I ran into a couple of cases where switches were upgraded to FOS 7.2.x and Fabric Watch was converted to MAPS. Nothing wrong with that but it seems that many administrators blindly kick off some rule-sets via BNA and leave it at that. Whilst I applaud the move to the latest and greatest code levels (mainly because the majority of known bugs are fixed) it also means that updated and/or new functionality needs to be reviewed and actively managed.

The example below shows you the lack of active management of one of the switches. Its an abbreviated listing of an errorlog from a Brocade switch.

2014/04/08-07:49:58, [MAPS-1003], 33463, SLOT 7 | FID 128, WARNING, CORE_CHASSIS, Chassis, Condition=CHASSIS(MEMORY_USAGE>0), Current Value:[MEMORY_USAGE,28.00 %], RuleName=fw_MemRuleTh3_0, Dashboard Category=Switch Resource .

2014/04/08-08:01:58, [MAPS-1003], 33466, SLOT 7 | FID 128, WARNING, CORE_CHASSIS, Chassis, Condition=CHASSIS(MEMORY_USAGE>0), Current Value:[MEMORY_USAGE,28.00 %], RuleName=fw_MemRuleTh3_0, Dashboard Category=Switch Resource .

2014/04/08-08:13:58, [MAPS-1003], 33469, SLOT 7 | FID 128, WARNING, CORE_CHASSIS, Chassis, Condition=CHASSIS(MEMORY_USAGE>0), Current Value:[MEMORY_USAGE,28.00 %], RuleName=fw_MemRuleTh3_0, Dashboard Category=Switch Resource .

2014/04/08-08:31:58, [MAPS-1003], 33473, SLOT 7 | FID 128, WARNING, CORE_CHASSIS, Chassis, Condition=CHASSIS(MEMORY_USAGE>0), Current Value:[MEMORY_USAGE,28.00 %], RuleName=fw_MemRuleTh3_0, Dashboard Category=Switch Resource .

2014/04/08-08:43:58, [MAPS-1003], 33476, SLOT 7 | FID 128, WARNING, CORE_CHASSIS, Chassis, Condition=CHASSIS(MEMORY_USAGE>0), Current Value:[MEMORY_USAGE,28.00 %], RuleName=fw_MemRuleTh3_0, Dashboard Category=Switch Resource .

2014/04/08-08:49:58, [MAPS-1003], 33477, SLOT 7 | FID 128, WARNING, CORE_CHASSIS, Chassis, Condition=CHASSIS(MEMORY_USAGE>0), Current Value:[MEMORY_USAGE,28.00 %], RuleName=fw_MemRuleTh3_0, Dashboard Category=Switch Resource .

2014/04/08-09:01:58, [MAPS-1003], 33480, SLOT 7 | FID 128, WARNING, CORE_CHASSIS, Chassis, Condition=CHASSIS(MEMORY_USAGE>0), Current Value:[MEMORY_USAGE,28.00 %], RuleName=fw_MemRuleTh3_0, Dashboard Category=Switch Resource .

2014/04/08-09:13:58, [MAPS-1003], 33485, SLOT 7 | FID 128, WARNING, CORE_CHASSIS, Chassis, Condition=CHASSIS(MEMORY_USAGE>0), Current Value:[MEMORY_USAGE,28.00 %], RuleName=fw_MemRuleTh3_0, Dashboard Category=Switch Resource .

2014/04/08-09:19:58, [MAPS-1003], 33486, SLOT 7 | FID 128, WARNING, CORE_CHASSIS, Chassis, Condition=CHASSIS(MEMORY_USAGE>0), Current Value:[MEMORY_USAGE,28.00 %], RuleName=fw_MemRuleTh3_0, Dashboard Category=Switch Resource .

<snip>

2014/04/11-04:07:58, [MAPS-1003], 34501, SLOT 7 | FID 128, WARNING, CORE_CHASSIS, Chassis, Condition=CHASSIS(MEMORY_USAGE>0), Current Value:[MEMORY_USAGE,28.00 %], RuleName=fw_MemRuleTh3_0, Dashboard Category=Switch Resource .

2014/04/11-04:25:58, [MAPS-1003], 34505, SLOT 7 | FID 128, WARNING, CORE_CHASSIS, Chassis, Condition=CHASSIS(MEMORY_USAGE>0), Current Value:[MEMORY_USAGE,28.00 %], RuleName=fw_MemRuleTh3_0, Dashboard Category=Switch Resource .

2014/04/11-04:37:58, [MAPS-1003], 34508, SLOT 7 | FID 128, WARNING, CORE_CHASSIS, Chassis, Condition=CHASSIS(MEMORY_USAGE>0), Current Value:[MEMORY_USAGE,28.00 %], RuleName=fw_MemRuleTh3_0, Dashboard Category=Switch Resource .

2014/04/11-04:49:58, [MAPS-1003], 34510, SLOT 7 | FID 128, WARNING, CORE_CHASSIS, Chassis, Condition=CHASSIS(MEMORY_USAGE>0), Current Value:[MEMORY_USAGE,28.00 %], RuleName=fw_MemRuleTh3_0, Dashboard Category=Switch Resource .

2014/04/11-04:55:58, [MAPS-1003], 34512, SLOT 7 | FID 128, WARNING, CORE_CHASSIS, Chassis, Condition=CHASSIS(MEMORY_USAGE>0), Current Value:[MEMORY_USAGE,28.00 %], RuleName=fw_MemRuleTh3_0, Dashboard Category=Switch Resource .

2014/04/11-05:07:58, [MAPS-1003], 34515, SLOT 7 | FID 128, WARNING, CORE_CHASSIS, Chassis, Condition=CHASSIS(MEMORY_USAGE>0), Current Value:[MEMORY_USAGE,28.00 %], RuleName=fw_MemRuleTh3_0, Dashboard Category=Switch Resource .

2014/04/11-05:25:58, [MAPS-1003], 34519, SLOT 7 | FID 128, WARNING, CORE_CHASSIS, Chassis, Condition=CHASSIS(MEMORY_USAGE>0), Current Value:[MEMORY_USAGE,28.00 %], RuleName=fw_MemRuleTh3_0, Dashboard Category=Switch Resource .

2014/04/11-05:37:58, [MAPS-1003], 34524, SLOT 7 | FID 128, WARNING, CORE_CHASSIS, Chassis, Condition=CHASSIS(MEMORY_USAGE>0), Current Value:[MEMORY_USAGE,29.00 %], RuleName=fw_MemRuleTh3_0, Dashboard Category=Switch Resource .

As the above shows there is a MAPS ruleset active which triggers an event being logged if the meory usage is above 0% (?!?!?! (I don’t know who came up with that idea but the moment you switch on the box the boolean obviously returns true.) Irrespective if this rules being total nonsens it should be turned off or replaced with a different setting.

Given the fact the events occur over a long time-period I assume that these switches are not actively managed. What the above shows is that these event log 3 to 4 times an hour but nobody looks at it and raises any questions on the validity of these events and start to make adjustments. Not only is the above rule-set useless it also obfuscates real problems that might be logged sporadically. Such an event could be discarded simply because the eventlog wraps after a certain amount of entries. Troubleshooting issues that happen infrequently becomes very difficult this way.

In short. Even though the code is up to date it does not relieve you from actively managing the device and take appropriate actions based on events you see. Not only will you be able to identify and differentiate between garbage and useful info, it also provides much more evidence in case a serious problem has occurred. This is not only true for the example above but obviously applies to every piece of equipment/software out there. Remember, you business relies on it.

Regards,

Erwin

Print Friendly, PDF & Email

Subscribe to our newsletter to receive updates on products, services and general information around Linux, Storage and Cybersecurity.

The Cybersecurity option is an OPT-OUT selection due to the importance of the category. Modify your choice if needed.

Select list(s):

5 responses on “Brocade MAPS “non-management”

  1. Bacil

    Hi, We want to use MAPS features in new FOS. However, few of our switches still have FOS below 7.2 and due to hardware limitations on those switches, we can’t have new FOS for MAPS, and we still have to live with them for some more time. My question is , can we have in a same fabric , MAPS on all and two three switches with still Fabric Watch. Okay, while converting Network Advisor will complain for them, but is there any other drawback apart from that. I mean since MAPS or FW are configured on switch basis, so at the most we will be unable to manage them with MAPS in Network Advisor. Apart from that what other issues you see in this scenario…
    I also found this –
    “Fabric capability is based on the least capable switch participating in the fabric. If a fabric has products participating that are operating with an older version of Brocade FOS, the limits of the fabric must not exceed the maximum limits of that older version of Brocade FOS.”
    But then again, I thought it will not be managed by MAPS, that’s it. We will manage it via CLI and Fabric Watch alerts anyway we will receive via email alerting set on those individual switches.

    Thanks,
    Bacil.

  2. Andre

    While I am not certain why you were seeing a rule to alert on anything more than 0% Memory Usage, the default in BNA is to alert above a threshold of 75% memory utilization. I believe you will agree this is a reasonable default threshold value.

    1. Erwin van Londen Post author

      Hello Andre,

      I don’t know this either. I assume the administrator has been playing around with MAPS and inadvertently enabled an incorrect policy or created one of his own with the incorrect settings. This is however not what I wanted to emphasize with the article. What I wanted to do is highlight the fact that even when such a large amount of events are being logged, for obvious reasons, it seems no-one seems to inclined to adjust the rule-set. That, for me, is taking care of business and actively managing such an environment. Somebody can make a mistake by creating and enabling such a weird rule but as soon as the event-log starts to fill up he, or she, should be correcting this.

      Thanks for you response.

      Regards,
      Erwin

  3. gcharriere

    Hi Erwin,
    I discovered as well some unexpected behavior with the conversion tool from FW to MAPS. I would advice cloning one of the default policy and then do the modifications from this customized policy:
    mapspolicy –clone dflt_moderate_policy -name my_policy

    You can see below the default log threshold for CPU (80%) and Memory(75%) with the default moderate policy:
    mapspolicy –show my_policy
    defCHASSISMEMORY_USAGE_75 RASLOG,SNMP,EMAIL CHASSIS(MEMORY_USAGE/NONE>=75)
    defCHASSISCPU_80 RASLOG,SNMP,EMAIL CHASSIS(CPU/NONE>=80)

    If you decide to build your customized policy from one of the default ones, please take care at the fencing thresholds. Fencing is activated for almost all error counters with the default policies. This is a Brocade best practice that I would advice as well. However a lot of customers are afraid of such behavior. This is at least worth to be aware of it.

    Regards,
    Gael

    1. Erwin van Londen Post author

      Hello Gael,

      Yes, port-fencing is one of the best features that were included in FabricWatch a while ago. I even made a short video on it. The thing that staggers me is that very few customers are utilizing this excellent feature. As you’ve seen in my “Rotten apples” series only one single broken link can have a devastating effect on an entire fabric. Preventing this from having further ramifications is a massive gain in reliability. I still don’t understand why admins don’t use this feature. Last week I had an example where an entire fabric went haywire because of an ISL-port having synchronization issues. This caused the entire fabric to almost come to a stand-still because of the constant re-configuring and normal traffic was stopped. A simple MAPS or FabricWatch rule could have prevented this.

      Thanks for your feedback.

      Regards,
      Erwin