4 – Standards in operation

Consistency is one of the most important things in a computing environment. There are abundant examples where configuration inconsistencies have led to disastrous events leading up to dropped databases, setting up disk-mirrors in the wrong direction (ie overwriting a perfectly good partition with an empty one), operational mistakes due to wrong or unclear naming conventions and you name them. Anywhere where people work mistakes will be made so in order to try and prevent these kind of mistakes strict standards in operations needs to be followed. In the majority of businesses around the world some sort of framework for processes and changes is followed in the likes of ITIL or any of the alternatives.

The problem with these frameworks is often that they are implemented in a fashion which allows no or very little flexibility in day-to-day operations. I’ve even seen it come to the point where a SAN port was causing massive problem in the fabric but any corrective action was subject to change control via a CAB (Change Advisory Board). The issue remained for TWO DAYS (!!!!) and IO errors, file-system corruption and data-loss where a result of this strict policy. Obviously the implementation of the change control structure was too strict and revisions where made with decision capabilities being delegated to lower levels in the organization which basically meant that in the case of problems changes were documented on the go and registered in the change control structure afterwards.

What frameworks do not cover.

As an administrator, implementer and/or services person you still have a lot of freedom and a huge playing field to make mistakes. Management frameworks do not dictate how you design your environment, which topologies you choose, how you setup zoning, which naming convention you use and which diskpools you pick your volumes from. If you connect a cable to the wrong switch nasty things might happen etc. So you still have to apply your own common sense.

Naming conventions

Whatever naming convention you choose is up to the person who needs to administer the environment. It should be concise, meaningful, referable, extensible, simple and most of all consistent. If you name an object “ABC”, it should be called “ABC” in the ENTIRE environment and not “DEF” in another. Although you might think that naming an object differently in another context might be helpful in effect it isn’t. Not only requires it to keep additional administration it also introduces risk.

As an example I’ve seen fabrics where a server called server1 was connected to a port and the port on the switch was called “vendor3-hba1-target5” and the zone was called “application4-array15-port6”. Try to troubleshoot that. In effect what I’ve learned out of this from a troubleshooting perspective is to ignore everything administrators have come up with and only use machine generated information like port-numbers, FCID’s and WWN’s. As an administrator this doesn’t work however. You don’t memorize your contacts by their phone number not do you keep a memory-map of all your IP addresses in the network. That’s why we invented DNS.

So in short, keep your names simple to remember, use rigorous consistency across the board.

Examples:

Lets say I have a host with 4 dualport HBA’s of which each is connected to two different fabrics. You could use something like “H001_PCID” where “H001” is the hostname and PCID is, obviously, the PCI id used on that particular host.
Use this name in every piece of the fabric including portnames, FDMI info, zoning aliases, zone names and any other place in the storage environment where this should be used.

Hardware preparations

Before you lay you hands on a keyboard to configure a switch it needs to be physically installed first. Unfortunately there have been cases where this was done not according to the manufacturers specifications and numerous issues occured afterwards. This included incorrect cabling setup, wrong power requirements, no static shielding etc. As you might expect these are disasters in the making. So make sure the exact physical installation requirements are met for each type of switch and/or director irrespective of vendor. Do NOT take short-cuts or take some sections with a grain of salt. Its exactly this salt which will hurt most when the failure wound is opened due to these shortcuts.

Configuration preparations.

By now I assume you have powered on the three switches in the reference architecture. There is no ISL connectivity (yet) and each switch operates as an individual entity.

First thing to do is to configure the IP connectivity of the three switches as I’ve outlined. Connect a serial cable to the “1010101” port with 9600-8-N-1 settings. Log in with “admin” and “password” as first credentials and modify this immediately with the strong passwords you generated. Make sure on blade based switches the three IP addresses fall in the same subnet otherwise you might run into trouble later on with the virtual ip-address (which is hanging of a network bond device) on the chassis. In my example everything sits in the same subnet so that should be good to go. The command to use is “ipaddrset”. Based on the switch model you need to use different parameters for the switches.

In my example i would use:

SW01:>ipaddrset -ipv4 -add 10.10.10.50 -ethmask 255.255.255.0 -gwip 10.10.10.1 -dhcp OFF

SW03:>ipaddrset -ipv4 -add 10.10.10.54 -ethmask 255.255.255.0 -gwip 10.10.10.1 -dhcp OFF

SW02:>ipaddrset -chassis -add 10.10.10.51 -ethmask 255.255.255.0

SW02:>ipaddrset -cp 0 -add -ipv4 10.10.10.52 -ethmask 255.255.255.0 -gwip 10.10.10.1

SW02:>ipaddrset -cp 1 -add -ipv4 10.10.10.53 -ethmask 255.255.255.0 -gwip 10.10.10.1

Later on you will see that if you have created virtual switches you can assign separate IP addresses to each logical switch. This can provide you some additional options from both a security and administrative perspective.

Next this to do is upgrade the firmware to the latest version as advised by your vendor. A very basic rule of thumb is to use only .1 or at least some patched releases. Especially with major versions this could be a life-saver.

In my case I will use Brocade FOS version 7.3.0 as it is the latest being released at the time of this writing. I also assume that the firmware already loaded on the switch is 7.2.0 or later. If the firware is less than that I will first need to upgrade to 7.2.x as per Brocade’s 1 level upgrade policy.

As per the setup of the ftp server use the following command:

SW0X:>firmwaredownload -p sftp 10.10.10.11,user,/fos/v7.3.0,password

This downloads the firmware to each of the partitions on the respective switches and activates the firmware image upon ha-reboot. Each firmware upgrade is, or should be, NON-disruptive for fibre-channel traffic. You will see that if other hardware is used like for instance FCIP blades, the traffic on the FCIP link will halt briefly. This is due to the renewed firmware images being applied to the FPGA’s on those blades.

When all switches have been upgraded successfully it is time to start configuring.

Below I’ve described some default setting which I’ve always recommended.

  1. Enable Virtual Fabrics. It doesn’t require a license but is disruptive if you want to use it later on. (We’ll get back to this feature later)
  2. Don’t use the default switch. Leave this one disable as use it for a holding switch for ports not used.
  3. Create a base switch. You’ll need this for FCR routing purposes.
  4. Create one or more logical switches depending on your requirements. Only use THESE logical switches to connect equipment as needed.
  5. Set a fixed fabric principle with a high priority and best to have a new model with the highest firmware possible. In our example the DCX8510 in the core will become fabric principle. It will prevent any other switch of becoming a fabric principle in case another (older) switch is added to the fabric. Surprises may await you when your fabric grows.
  6. Use FabricWatch or MAPS. Buy this license as it will save you headaches in the future.
  7. Enable firmware based credit recovery modes. (We’ll get back to this later.)
  8. More to come in separate sections

 

Print Friendly, PDF & Email

Subscribe to our newsletter to receive updates on products, services and general information around Linux, Storage and Cybersecurity.

The Cybersecurity option is an OPT-OUT selection due to the importance of the category. Modify your choice if needed.

Select list(s):