1.1 – MAPS – Know what’s going on.

I’ve written about Fabric Watch quite a lot and I have always stressed the usefulness of this licensed add-on as a feature in FOS. This post will outline the major characteristics of MAPS and why you should migrate now. As of FOS 7.2 there has been a transition from Fabric Watch to MAPS (Monitoring and Alerting Policy Suite) and over the past few FOS versions it has seen a huge improvement in overall RAS (Redundancy, Availability and Serviceability) monitoring features. As of FOS 7.4 FabricWatch is no longer incorporated in FOS and as such MAPS is the only option you have if you want to use it.  MAPS is one section of a two part suite called Fabric Vision together with its performance companion “Flow-vision”. The MAPS part can interact with flow-vision based on criteria you specify and monitor/alert on performance related events.

Transition from FabricWatch to MAPS

The move from FabricWatch to MAPS is fairly straightforward but be aware it is ONE WAY. There is no turning back. If you have used and configured FabricWatch in previous FOS version you need to convert the FW rulebase to MAPS. The simple commandline for that is

mapsconfig –fwconvert

This will then create three rule-sets:

fw_default_policy

fw_custom_policy

fw_active_policy

These three are thus the converted rules-sets out of FabricWatch and do lack a lot of all available rules at your disposal. If you have not used FW before or simply have used the default FW settings I would suggest to start of with one of the new rulessets that come with MAPS. Brocade has three default rulesets created when you enable MAPS.

Enabling MAPS

Pre-FOS 7.4 MAPS is not enabled by default. You have to enable it manually with the command

mapsconfig –enable -policy <policyname>

where the <policyname> obviously is the one you wish to use.

When you have enabled maps the FW commands no longer work. For example if you execute  “thconfig –show” you will see a message showing “MAPS is enabled, Fabric Watch is disabled. Please use MAPS for monitoring and execute mapsHelp command for available MAPS commands.” The MAPS commands are more straightforward compared to the FW CLI which, I must say, had become a bit of a sprawl with all sorts of options and parameters. (You can compare the FW with the MAPS CLI if you enter the fwhelp or mapshelp commands.)

Licensing

licenses_320

In FOS 7.4 MAPS is broken down in two parts. The free section allows you to view the basics in MAPS. FRU statuses, the MAPS dashboard and basically the elements that show overall switch health. It does not allow you configure separate rules and policies.

The Fabric Vision license enables all MAPS and FlowVision features for the entire chassis. It is not limited for a certain amounts of elements nor bound to specific slots.

If you had FabricWatch and Advanced Performance Monitor already licensed for the switch in previous FOS version these will enable all MAPS and FlowVision features as well. Many Brocade OEM’s do ship a so called Enterprise Bundle which includes “Adaptive Networking, FabricWatch, Advanced Performance Monitor and Trunking”. Recently this bundle has been extended with the FabricVision license so irrespective if you had purchased the “old” or “new” Enterprise Bundle  you should be good to go.

Getting started

As mentioned Brocade ship three default rulesets

  • dflt_conservative_policy
  • dflt_moderate_policy
  • dflt_aggressive_policy

The main difference between these rules-sets is that the actions tied to certain events will cause the switch to turn off a port (any port) much quicker.

An example is for the dftl_aggressive_policy an E-port will be shut down and decomissioned on 2 CRC errors per minute whereas it will execute the same action after 20 or 40 for the dflt_moderate_policy and dflt_conservative_policy instead.

Agressive:

defALL_E_PORTSCRC_2  FENCE,DECOM,SNMP,EMAIL ALL_E_PORTS(CRC/MIN>2)

Moderate:

defALL_E_PORTSCRC_20 FENCE,DECOM,SNMP,EMAIL ALL_E_PORTS(CRC/MIN>20)

Conservative:

defALL_E_PORTSCRC_40 FENCE,DECOM,SNMP,EMAIL ALL_E_PORTS(CRC/MIN>40)

ETA

MAPS is based on an ETA methodology:

Events -> Threshold -> Action

Each of these can be mapped to a single element type (E-ports Fports, CPU, Memory etc.) or a group of elements. (F-ports for unix, windows, linux etc) . This allows for a very granular set of options to monitor each of these elements or group of elements for certain events and if thresholds are met or crossed execute the configured actions.

The elements MAPS is constructed around are:

  1. Category – A group of elements with the same characteristics.
  2. Condition – What state does an element show
  3. Element – the individual component that is monitored
  4. Group – The logical grouping of elements
  5. Rule – A definition of one or more values which causes an action to be triggered
  6. Policy – A group of rules.
  7. Action – What should be done when thresholds are reached

The category is a group of elements which is monitored as a whole and as such depicted in the maps dashboard. (I’ll get to that later)

Port Health            CRC, ITW, LOSS_SYNC, LF, LOSS_SIGNAL, PE, LR, C3TXTO, STATE_CHG, CURRENT, RXP, TXP, VOLTAGE, SFP_TEMP, PWR_HRS

FCIP Health            CIR_STATE, CIR_UTIL, CIR_PKTLOSS, RTT, JITTER, STATE_CHG, UTIL, PKTLOSS

Traffic Performance    RX, TX, UTIL, TX_FCNT, RX_FCNT, TX_THPUT, RX_THPUT, IO_RD, IO_WR, IO_RD_BYTES, IO_WR_BYTES

Security Health        SEC_DCC, SEC_HTTP, SEC_CMD, SEC_IDB, SEC_LV, SEC_CERT, SEC_FCS, SEC_SCC, SEC_AUTH_FAIL, SEC_TELNET , SEC_TS

Fabric State Change    DID_CHG, FLOGI, FAB_CFG, EPORT_DOWN, FAB_SEG, ZONE_CHG, L2_DEVCNT_PER, LSAN_DEVCNT_PER, ZONE_CFGSZ_PER, BB_FCR_CNT

Switch Resource        TEMP, FLASH_USAGE, CPU, MEMORY_USAGE, ETH_MGMT_PORT_STATE

Switch Status Policy   BAD_PWR, BAD_TEMP, BAD_FAN, FLASH_USAGE, MARG_PORTS, FAULTY_PORTS, MISSING_SFP, ERR_PORTS, WWN_DOWN, DOWN_CORE, FAULTY_BLADE, HA_SYNC

FRU Health             PS_STATE, FAN_STATE, SFP_STATE, BLADE_STATE, WWN

As you can see each category thus contains measurable components which can be polled per time interval or a certain state.

Ranges, State and Timebase

As shown in the categories some monitored elements have either a specific state (On/Off Good/Bad) or they contain a certain numeric value. An element in a certain state cannot be monitored on a timebase level. For example you cannot say if a power-supply is off for 10 times per day do XYZ. The state is monitored per occurrence which means that if a switch determines that a power-supply is off where its previous state was on, it will launch the action that is configured in that particular rule. Other elements like port-errors or utilisation can be monitored on a certain time-frame. This means you can configure thresholds based on a certain number of occurrences per that time-frame. FabricWatch in pre FOS 7.0 had a second,minute,hour and day timebase. As of FOS 7 the “second” timebase was dropped as that had no real operational value.

The state a certain element can be in is ON, OFF, IN, OUT or FAULTY. This can be often seen in the slotshow or psshow output.

The range of a monitored unit has a lower and/or upper boundary. This is most often related to temperature values where it may fluctuate depending on different environmental scenario’s.

Values and operands

For elements that are monitored on counters you specify a certain value as a threshold in a particular rule. These values are compared to the operands in that rule and if that threshold is hit the configured action will be executed.

The operands are fairly well known. “L” for Less than, “LE” for Less or Equal than, “G” for Greater than, “GE” for Greator or Equal than and EQ for Equal than.

Actions

These speak more or less for themselves.

  • RASLOG – Logs the event in the eventlog
  • EMAIL – If email notification is configured it will send an email
  • SNMP – If configured SNMP traps will be send to the management system.
  • NONE – Do nothing. (Duhh)
  • SW_MARGINAL – Puts the switch in a marginal state. This will reflect on the maps dashboard as well as management software like Brocade Network Advisor or something alike
  • SW_CRITICAL – Puts the switch in a CRITICAL state and, as with MARGINAL, it will also be reflected in management software.
  • SFP_MARGINAL – If SFP’s observe certain issues like low light or power levels.
  • FENCE – Will turn off that particular port – hard (I’ll get back to this later)
  • DECOM – Will more gracefully turn off the port. (E-ports only)

As for the FENCE and DECOM actions these should be normally configured together for E-ports. Portfencing basically shuts down the port by moving it out of the active state. This action is immediate and does not take into account any frames that may be buffered or under way in any sense. The DECOM action, which needs to be configured together with the FENCE action, precedes the FENCE action by first holding off all R_RDY or VC_RDY primitive signals on the RX side therefore making sure that the TX side will not send any frames. As soon as the credit-counter hits zero it will shut down the port immediately. This will prevent frame-loss as much as possible. This behaviour is similar as when the “portdecom” command is used. Make sure that lossless DLS is enabled on both sides of the link in case of an E-port.

Performance

This topic warrants a separate chapter/post but for completeness I’ll touch on it shortly.

MAPS uses a different performance category called FPI, Fabric Performance Impact. This category is also displayed on the MAPS dashboard and can provide a very useful insight on performance issues. As of FOS 7.3 MAPS FPI can replace legacy bottleneck monitoring and, in fact, will tell you when you want to enable it. Legacy bottleneck monitoring and MAPS FPI are mutually exclusive.

MAPS monitors certain counters and values to determine overall performance indicators. These counters and values are based on throughput, congestion and latency.

A followup post will do a deep-dive on performance monitoring with MAPS and FPI.

Resource and Scalability limits

In most environments you will not run into any of these however in some extreme installations you might be needing to keep an eye on this as some of these scratch the surface of what a flat layer 2 fibre-channel network can do.

  • L2 device monitoring. Trigger when a certain percentage of the maximum of the connected devices are logged in the fabric. (Currently with FOS 7.4 this is 6000)
  • Number of LSAN mapped devices in a back-bone fabric.
  • Number of total routers in a back-bone fabric (excluding flat L2 switches)
  • Zoning limitations. Monitor the size of the zoning dababase.
  • Number of NPIV logins on a port. Keep an eye out on the number of virtual machines occupying port-resources.

Overall physical switch status

MAPS has a separate module keeping an eye out on physical and virtual ports, flash drive usage, memory and cpu. These can be obtained via the “mapssam” command.

DCX-4S_LS128:FID128:admin> mapssam –show

Total        Total        Down        Total
Port      Type         Up Time      Down Time    Occurrence  Offline Time
(Percent)    (Percent)    (Times)     (Percent)
=====================================
1/0         T           100.00       0.00            0       0.00
1/1         F           100.00       0.00            0       0.00
1/2         F           100.00       0.00            0       0.00
1/3         F           100.00       0.00            0       0.00
1/4        DP             0.00       0.00            0     100.00
1/5         U             0.08       0.00            0      99.92

DCX-4S_LS128:FID128:admin> mapssam –show cpu
Showing Cpu Usage:
CPU Usage : 28.0%

DCX-4S_LS128:FID128:admin> mapssam –show memory
Showing Memory Usage:
Memory Usage : 37.0%
Used Memory : 689379k
Free Memory : 1173809k
Total Memory : 1863188k

DCX-4S_LS128:FID128:admin> mapssam –show flash
Showing Flash Usage:
Flash Usage : 57%

 FCIP monitoring

MAPS can monitor performance and state characteristics on FCIP tunnels and circuits on the 7800, 7840 FCIP routers and FX8-24 blade. These include circuit level QoS, packet-loss and throughput. On the newer 7840 it can also monitor tunnel level QoS which gives a more complete overview when a tunnel consists of more than 1 circuit spanned over multiple GE ports.

Circuit level monitoring is especially useful when configured to alert on disruptions like packet-loss, high jitter values and overall circuit state. This will notify you when somewhere between the two circuit end-points (most often in a WAN environment) disruption occurs. This has a devastating effect on replication environments which will propagate into overall performance degradation on hosts as well. If you see a circuit observing state changes or high packet-loss you may be better off fencing it and direct traffic over another circuit in the tunnel.

Print Friendly, PDF & Email

Subscribe to our newsletter to receive updates on products, services and general information around Linux, Storage and Cybersecurity.

The Cybersecurity option is an OPT-OUT selection due to the importance of the category. Modify your choice if needed.

Select list(s):