Tag Archives: FOS

Upgrade FOS leads to fabric segmentation.

I ran into this in my lab when upgrading a switch which caused a fabric-segmentation and obviously the release notes of a previous version show:

Other Important Notes and Recommendations
Management Server Platform Capability support changes in FOS v6.4
FOS v6.4 no longer automatically enables the Management Server (MS) Platform capability when a switch attempts to join a fabric that has these services enabled. This prevents a FOS v6.4 switch from joining such a fabric, and ISL will be disabled with a RAS log message. To allow a FOS v6.4 switch to join such fabrics msPlMgmtActivate command should be used to enable the Management Server platform services explicitly.

Continue reading

Management Security in Brocade FOS

If you’re in my business of looking at logfiles to be able to determine what’s going on or what has happened one of the most annoying, and frightening, things to see is a sheer amount of failed login attempts. In most cases these are simply genuine mistakes where a lingering management application or forgotten script is still trying to login to obtain one piece of information or another out of the switch/fabric. The SAN switches are often well inside the datacentre management firewalls so attacks from outside the company are less likely to occur however when looking at security statistics over the last decade or so it turns out that threats are more likely to originate from inside the company boundaries. Employees mucking around with tools like nmap, MITD software like cane & able, or even an entire Kali Linux distro hooked up to the network “just to see what it does because a mate of mine recommended it”. In 99.999% of all install bases I looked at the normal embedded username/password mechanism is used for authentication and authorisation. This also means that if security management is not configured on these switches, a not so good-Samaritan is able to use significant brute force tactics to try and obtain access to these switches without anyone knowing. When using an external authentication mechanism like LDAP or TACACS+ chances are there are some monitoring procedures in place which monitor and alert on these kind of symptoms however the main issue is that the attack has already occurred and there is no mechanism to prevent these sorts of attacks on a level that really protects the switch. It is fairly simple to overload a switch with authentication attempts by firing off thousands of ssh,telnet and http(s) sessions (which is easily done from any reasonable priced laptop these days with a Linux distro like Kali installed) and therefore crippling the poor e500 CPU on the CP. This can have significant ramifications on overall fabric services in that switch which can bring down a storage network. Now, obviously there is a mechanism to try and prevent this via iptables however there are a number of back-draws.

Continue reading

8 – Quality of Service

Historically the need to segregate fibre-channel traffic and have the option to prioritize frames and flows has not been high on design agenda of most of the companies I had under my eyes. Most often if the need is there to differentiate between different levels of importance between the various business applications you’ll very often see that additional equipment is purchased and the topologies are adjusted as needed. This obviously works well however when the ratio of capex vs opex is out of balance but the business is still retaining the need for your applications to be separated in order of criticality, you need to consider other options. As in the IP networking world Fibre-Channel has a similar functionality which has been in the FC standards for a long time but only recently has been introduced by some vendors.

Continue reading

7 – Fabric Security

This topic is hardly ever touched when fabric designs are developed and discussed among storage engineers but for me this always sits on my TODO list before hooking up any HBA or array port. It is as important in the storage world as it has been in the IP networking sector for decades. Historically the reasoning to not pay attention to this topic was that the SAN was always deeply embedded in tightly controlled data-centres with strict access policies. Additionally the use of fibre-optics and relatively complex architectures to the storage un-inaugurated even more, unfairly, devalued the necessity of implementing security policies.

Let me make one thing clear: Being able to gain access to a storage infrastructures is like finding the holy grail for archaeologists. If no storage infrastructure security is implemented it will allow you to obtain ALL data for good or bad purposes but even worse it also allows the non-invited guest to corrupt and destroy it. With this chapter I will outline some of the procedures I consider a MUST and some which you REALLY should take a good look at and if possible implement them.

Continue reading

Brocade FOS 7.1 and the cool features

After a very busy couple of weeks I’ve spent some time to dissect the release notes of Brocade FOS 7.1 and I must say there are some really nice features in there but also some that I REALLY think should be removed right away.

It may come to no surprise that I always look very critical to whatever come to the table from Brocade, Cisco and others w.r.t. storage networking. Especially the troubleshooting side and therefore the RAS capabilities of the hardware and software have a special place in my heart so if somebody screws up I’ll let them know via this platform. 🙂
So first of all some generics. FOS 7 is supported on the 8 and 16G platforms which cover the Goldeneye2,Condor2 and Condor 3 ASICs plus the AP blades for encryption, SAN extension and FCoE. (cough, cough)….Be aware that it doesn’t support the blades based on the older architecture such as the FR4-18i and FC10-6 (which I think was never bought by anyone.)  Most importantly this is the first version to support the new 6520 switch so if you ever think of buying one it will come shipped with this version installed. 
Software
As for the software features Brocade really cranked up the RAS features. I especially do like the broadening of the scope for D-ports (diagnostics port) to include ICL ports but also between Brocade HBA’s and switch ports. One thing they should be paying attention to though is that they should sell a lot more of these. :-). Also the characteristics of the test patterns such as test duration, frame-sizes and number of frames can now be specified. Also FEC (Forward Error Correction) has been extended to access gateways and long distance ports which should increase stability w.r.t. frame flow. (It still doesn’t improve on signal levels but that is a hardware problem which cannot be fixed by software).
There are some security enhancements for authentication such as extended LDAP and TACACS+ support.
The 7800 can now be used with VF albeit not having XISL functionality. 
Finally the E_D_TOV FC timer value is propagated onto the FCIP complex. What this basically means that previously even though an FC frame had long timed-out according to FC specs (in general 2 seconds) it could still exist on the IP network in a FCIP packet. The remote FC side would discard that frame anyway thus wasting valuable resources. With FOS 7.1 the FCIP complex on the sending side will discard the frame after E_D_TOV has expired.
One of the most underutilised features (besides Fabric Watch) is FDMI (Fabric Device Management Interface). This is a separate FC service (part of the new FC-GS-6 standard) which can hold a huge treasure box of info w.r.t. connected devices. As an example:
FDMI entru
————————————————-
        switch:admin> fdmishow
        Local HBA database contains:
          10:00:8c:7c:ff:01:eb:00
          Ports: 1
            10:00:8c:7c:ff:01:eb:00
              Port attributes:
                FC4 Types: 0x0000010000000000000000000000000000000000000000000000000000000000
                Supported Speed: 0x0000003a
                Port Speed: 0x00000020
                Frame Size: 0x00000840
                Device Name: bfa
                Host Name: X3650050014
                Node Name: 20:00:8c:7c:ff:01:eb:00
                Port Name: 10:00:8c:7c:ff:01:eb:00
                Port Type: 0x0
                Port Symb Name: port2
                Class of Service: 0x08000000
                Fabric Name: 10:00:00:05:1e:e5:e8:00
                FC4 Active Type: 0x0000010000000000000000000000000000000000000000000000000000000000
                Port State: 0x00000005
                Discovered Ports: 0x00000002
                Port Identifier: 0x00030200
          HBA attributes:
            Node Name: 20:00:8c:7c:ff:01:eb:00
            Manufacturer: Brocade
            Serial Number: BUK0406G041
            Model: Brocade-1860-2p
            Model Description: Brocade-1860-2p
            Hardware Version: Rev-A
            Driver Version: 3.2.0.0705
            Option ROM Version: 3.2.0.0_alpha_bld02_20120831_0705
            Firmware Version: 3.2.0.0_alpha_bld02_20120831_0705
            OS Name and Version: Windows Server 2008 R2 Standard | N/A
            Max CT Payload Length: 0x00000840
            Symbolic Name: Brocade-1860-2p | 3.2.0.0705 | X3650050014 |
            Number of Ports: 2
            Fabric Name: 10:00:00:05:1e:e5:e8:00
            Bios Version: 3.2.0.0_alpha_bld02_20120831_0705
            Bios State: TRUE
            Vendor Identifier: BROCADE
            Vendor Info: 0x31000000
———————————————-
and as you can see this shows a lot more than the fairly basic nameserver entries:
——————————————-
N    8f9200;      3;21:00:00:1b:32:1f:c8:3d;20:00:00:1b:32:1f:c8:3d; na
    FC4s: FCP 
    NodeSymb: [41] “QLA2462 FW:v4.04.09 DVR:v8.02.01-k1-vmw38”
    Fabric Port Name: 20:92:00:05:1e:52:af:00 
    Permanent Port Name: 21:00:00:1b:32:1f:c8:3d
    Port Index: 146
    Share Area: No
    Device Shared in Other AD: No
    Redirect: No 
    Partial: No
    LSAN: No
——————————————
Obviously the end-device needs to support this and it has to be enabled. (PLEASE DO !!!!!!!!) It’s invaluable for troubleshooters like me….
One thing that has bitten me a few times was the SFP problem. There has long been a problem that when a port was disabled and a new SFP was plugged in the switch didn’t detect that until the port was enabled and it had polled for up-to-date information. In the mean time you could get old/cached info of the old SFP including temperatures, db values, current, voltage etc.. This seems to be fixed now so thats one less thing to take into account.
Some CLI improvements have been made on various commands with some new parameters which lets you filter and select for certain errors etc.
The biggest idiocracy that has been made with this version is to allow the administrator change the severity level of event-codes. This means that if you have a filter in BNA (or whatever management software you have) to exclude INFO level messages but certain ERROR or CRITICAL messages start to annoy you you could change the severity to INFO and thus they don’t show up anymore. This doesn’t mean th problem is less critical so instead of just fixing the issue we now just pretend it’s not there. From a troubleshooting perspective this is disastrous since we look at a fair chuck of sup-saves each day and if we can’t rely on consistency in a log file it’s useless to have a look in the first place. Another one of those is the difference in deskew values on trunks when FEC is enabled. Due to a coding problem these values can differ up to 40 therefore normally depicting a massive difference in cable length. Only by executing a d-port analysis you can determine if that is really the case or not. My take is that they should fix the coding problem ASAP.  
A similar thing that has pissed me off was the change in sfpshow output. Since the invention of the wheel this has been the worst output in the brocade logs so many people have scripted their ass off to make it more readable.
Normally it looks like this:
=============
Slot  1/Port  0:
=============
Identifier:  3    SFP
Connector:   7    LC
Transceiver: 540c404000000000 2,4,8_Gbps M5,M6 sw Short_dist
Encoding:    1    8B10B
Baud Rate:   85   (units 100 megabaud)
Length 9u:   0    (units km)
Length 9u:   0    (units 100 meters)
Length 50u:  5    (units 10 meters)
Length 62.5u:2    (units 10 meters)
Length Cu:   0    (units 1 meter)
Vendor Name: BROCADE         
Vendor OUI:  00:05:1e
Vendor PN:   57-1000012-01   
Vendor Rev:  A   
Wavelength:  850  (units nm)
Options:     003a Loss_of_Sig,Tx_Fault,Tx_Disable
BR Max:      0   
BR Min:      0   
Serial No:   UAF11051000039A 
Date Code:   101212  
DD Type:     0x68
Enh Options: 0xfa
Status/Ctrl: 0xb0
Alarm flags[0,1] = 0x0, 0x0
Warn Flags[0,1] = 0x0, 0x0
                                          Alarm                  Warn
                                   low        high       low         high
Temperature: 31      Centigrade    -10         90         -5          85
Current:     6.616   mAmps          1.000      17.000     2.000       14.000 
Voltage:     3273.4  mVolts         2900.0      3700.0    3000.0       3600.0 
RX Power:    -2.8    dBm (530.6uW) 10.0   uW 1258.9 uW   15.8   uW  1000.0 uW
TX Power:    -3.3    dBm (465.9 uW)125.9  uW   631.0  uW  158.5  uW   562.3  uW
and that is for every port which basically makes you nuts.
So with some bash,awk,sed magic I scripted the output to look like this:
Port  Speed   Long  Short  Vendor     Serial            Wave   Temp   Current  Voltage   RX-Pwr   TX-Pwr
wave wave number Length
1/0 8G NA 50 m BROCADE UAF11051000039A 850 31 6.616 3273.4 -2.8 -3.3
1/1 8G NA 50 m BROCADE UAF110510000387 850 32 7.760 3268.8 -3.6 -3.3
1/2 8G NA 50 m BROCADE UAF1105100003A3 850 30 7.450 3270.7 -3.3 -3.3
etc....
From a troubleshooting perspective this is so much easier since you can spot issues right away.
Now with FOS 7.1.x the FOS engineers screwed up the SFPshow output which inherently screwed up my script which necessitates a load more work/code/lines to get this back into shape. The same thing goes for the output on the number of credits on virtual channels.
Pre-FOS 7.1 it looks like this:
C:—— blade port 64: E_port ——————————————
C:0xca682400: bbc_trc                 0004 0000 002a 0000 0000 0000 0001 0001 
With FOS 7.1 it looks like this:
bbc registers
=============
0xd0982800: bbc_trc                 20   0    0    0    0    0    0    0    
(Yes, hair pulling stuff, aaarrrcchhhh)
Some more good things. The fabriclog now contains the direction of link resets. Previously we could only see an LR had occurred but we didn’t see who initiated it. Now we can and have the option to figure out in which direction credit issues might have been happening. (phew..)
The CLI history is now also saved after reboots and firmware-upgrades. Its been always a PITA to figure out who had done what at a certain point-in-time. This should help to try and find out.
One other very useful thing that has been added and it a major plus in this release is the addition of the remote WWNN of a switch in the switchshow and islshow output even when the ISL has segmented for whatever reason. This is massively helpful because normally you didn’t have a clue what was connected so you also needed to go through quite some hassle and check cabling or start digging through the portlogdump with some debug flags enabled. Always a troublesome exercise. 
The bonus points from for this release is the addition of the fabretrystats command. This gives us troubleshooters a great overview of statistics of fabric events and commands. 
SW_ILS
————————————————————————————————————–
E/D_Port ELP  EFP  HA_EFP DIA  RDI  BF   FWD  EMT  ETP  RAID GAID ELP_TMR  GRE  ECP  ESC  EFMD ESA  DIAG_CMD 
————————————————————————————————————–
0        0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
69       0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
71       0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
79       0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
131      0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
140      0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
141      0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
148      0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
149      0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
168      0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
169      0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
174      0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
175      0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
This release also fixes a gazillion defects so its highly advisable to get to this level better sooner than later. Check with your vendor for the latest supported release.
So all in all good stuff but some things should be reverted, NOW!!!. and PLEASE BROCADE: don’t screw up more output in such a way it breaks existing analysis scripts etc…
Cheers
Erwin

Brocade just got Bigger and Better

A couple of months ago Brocade invited me to come to San Jose for their “next-gen/new/great/ what-have-ya” big box. Unfortunately I had something else on the agenda (yes, my family still comes first) and I had to decline. (they didn’t want to shift the launch of the product because I couldn’t make it. Duhhhh.)

So what is new? Well, it’s not really a surprise that at some point in time they had to come out with a director class piece of iron to extend the VDX portfolio towards the high-end systems.  I’m not going to bore you with feeds and speeds and other spec-sheet material since you can download that from their web-site yourself.

What is interesting is that the VDX 8770 looks, smells and feels like a Fibre-Channel DCX8510-8 box. I still can’t prove physically but it seems that many restriction on the L2 side have a fair chunk of resemblance with the fibre-channel specs. As I mentioned in one of my previous posts, flat fabrics are not new to Brocade. They have been building this since the beginning of time in the Fibre-Channel era so they do have quite some experience in scalable flat networks and distribution models. One of the biggest benefits is that you can have multiple distributed locations and provide the same distributed network without having to worry about broadcast domains, Ethernet segments, spanning-tree configurations and other nasty legacy Ethernet problems.

Now I’m not tempted to go deep into the Ethernet and IP side of the fence. People like Ivan Pepelnjack and Greg Ferro are far better in this. (Check here and here )

With the launch of the VDX Brocade once again proves that when they set themselves to get stuff done they really come out with a bang. The specifications far outreach any competing product currently available in the market. Again they run on the bleeding edge of what the standards bodies like IEEE, IETF and INCITS have published lately. Not to mention that Brocade has contributed in that space makes them frontrunners once again.

So what are the draw-backs. As with all new products you can expect some issues. If I recall some high-end car manufacturer had to call-in an entire model world-wide to have something fixed in the brake-system so its not new or isolated to the IT space. Also with the introduction of the VDX an fair chuck of new functionality has gone into the software. It’s funny to see that something we’ve taken for granted in the FC space like layer 1 trunking is new in the networking space.

Nevertheless NOS 3.0 is likely to see some updates and patch releases in the near future. Although I don’t deny some significant Q&A has gone into this release its a fact that by having new equipment with new ASICS and functionality always brings some sort of headaches with them.

Interoperability is certified with the MLX series as well as the majority of the somewhat newer Fibre-Channel kit. Still bear in mind the require code levels since this is always a problem on supportcalls. 🙂

I can’t wait to get my hand on one of these systems and am eager to find out more. If I have I’let you know and do some more write-up here.

Till next time.

Cheers,
Erwin

DISCLAIMER : Brocade had no influence in my view depicted above.