FCIP configuration, pitfalls and troubleshooting

FCIP has been around for quite a while. The fine engineers of CNT/McData/Brocade/Rhapsody/Vixel (you name them) saw early one that a method was needed to overcome the, back then, distance limitation of around 10KM. This was not due to a limitation in the FC protocol itself but more due to the fact the hardware back in the early 2000’s was not up to scratch to push FC frames over longer distances. Another drawback is that FC by nature is not routable. (Not taking into account the FC-IFR which came later and was developed between 2004 and 2008). That, by definition, makes it difficult to be adopted into existing infrastructures where no native FC extenders or other equipment like DWDM/CWDM was available to bridge the distance between two native FC pieces of equipment.

Contrary to popular belief the FCIP protocol was not developed under the T11 ANSI (now INCITS) umbrella but it was actually the IETF who took on this task. The standard is published under RFC-3821.

One thing that is paramount when designing an FCIP solution is that the entire rules and regulations of storage I/O channels need to be adhered to. This means that you cannot treat an FCIP link as an ordinary IP connection where applications and operating systems have many ways to retain a very flexible correction and recovery mechanism. You need to see an FCIP link as an extended channel from host to disk. In addition you then also need to take into account the behavioural differences between FC and IP.

Protocol clashes

One of them is delivery order. Because of the fairly flexible timers in IP and the inherent dynamics of WAN behaviour the traffic flow on each side of the FC encapsulation chip is massively different. Flow-control on an FCIP link is handled by TCP as is delivery order. The FC protocol does allow for out-of-order delivery between FC exchanges however the delivery to the SCSI stack will always be in order. If the WAN infrastructure allows for traffic to take many paths from ingress to egress points you can be sure it will do so. Although routing protocols like OSPF will determine the shortest path possible there will be circumstances where IP packets will arrive out-of-order. It is then up to the receiving side to follow timers and wait for all packets to arrive before it is able to re-assemble them, remove the IP encapsulation header and push the frame to the FC chip for further switching to the destination.

IP disturbances

As I’ve mentioned above, the storage protocols like FC and SCSI are far more strict and timers are often relatively short compared to IP. Any disturbance in IP behaviour will have an exponentially larger negative effect on the storage side. So what do you need to take into account when looking for IP irregularities on a FCIP tunnel/circuit.

Out-of-Order deliveries

I use a Brocade example in this post but the counters regarding TCP etc are similar on Cisco SSN equipment.

Each FCIP tunnel consist of one or more circuits (more on that below and in my post here). Each FCIP circuit maintains 4 TCP sessions which resemble the 4 QoS levels (Fabric, High, Medium and Low). Unless QoS-High and QoS-Low zones have been defined all traffic will by default use the Medium TCP session on a circuit. (These are also the sessions where IP QoS like DSCP is configured.)

As an example:

portshow fciptunnel all -c --lifetime    :
-------------------------------------------------------------------------------
 Tunnel Circuit  OpStatus  Flags    Uptime  TxMBps  RxMBps ConnCnt CommRt  Met
-------------------------------------------------------------------------------
 17     -         Up      mf-----  123d19h    0.00    0.00    1      -      -
 17     0 ge1     Up      ---4--s  123d19h    0.00    0.00    1   400/800   0
-------------------------------------------------------------------------------
 Flags:  tunnel: c=compression m=moderate compression a=aggressive compression
                 A=Auto compression f=fastwrite t=Tapepipelining F=FICON
                 T=TPerf i=IPSec l=IPSec Legacy
 Flags: circuit: s=sack v=VLAN Tagged x=crossport 4=IPv4 6=IPv6
                 L=Listener I=Initiator

As you can see I have one tunnel (17) configured with fast-write and moderate compression enabled consisting of a single circuit configured on interface GE1. The commit rate of that circuit is set to 400/800 which resembles a minimum and maximum bandwidth capacity. (ie I’ve been given a guarantee that the minimum bandwidth is always at least 400Mb/s.) When looking at how the TCP window mechanism works whereby sometimes an entire buffer is cleared when errors are detected (ie slowstart) the default TCP flow control mechanism will drop back to 50% of its last value. So in this case if my speed was at the maximum rate of 800Mb/s and a form of congestion caused the window to blow up it would normally fall back to 400Mb/s and grow back to 800Mb/s in a stepping model. By configuring a minimum value of, lets say, 600Mb/s you’ll make sure that the drop is not to 400 but to 600 which increases recovery time to the maximum in a shorter time-frame.

As a rule of thumb I always use a 80% rule for the maximum bandwidth the WAN network administrator tells me. If he/she tells me the I have a guaranteed bandwidth of 1Gb/s I use 800Mb/s. (Experience tells me to do this. :-))

The four TCP sessions in each circuit are configured differently according to an internal Brocade algorithm to make sure that in a slow-start situation the different QoS levels get an proportional amount of bandwidth. You can see this in the TCP session overview on each of the circuits.

-------------------------------------------
   Circuit ID: 17.0
      Circuit Num: 0
      Admin Status: Enabled
      Oper Status: Up
      Connection Type: Default
      Remote IP: 10.xxx.xxx.xxx
      Local IP: 10.xxx.xxx.xxx
      Metric: 0
      Min Comm Rt: 400000
      Max Comm Rt: 800000
=======================

-------------------------------------------
      TCP Connection 17.0:2031471624
         Priority: F-Class
         Max Seg Size: 1460
         Adaptive Rate Limiting Statistics:
            None (F-Class)

As you can see there is no rate limit set for F-class traffic. This kind of traffic will ALWAYS have priority over any other. This is by design because otherwise normal user-data traffic might lead to congestion and maybe preventing essential fabric information to get lost. This will then segment a fabric and no traffic will be able to flow at all.

-------------------------------------------
      TCP Connection 17.0:2031623256
         Priority: Low
         Max Seg Size: 1460
         Adaptive Rate Limiting Statistics:
            Min Rate: 160000 kbps
            Max Rate: 800000 kbps
            Cur Rate: 160000 kbps
            Soft Limit: 800000 kbps

-------------------------------------------
      TCP Connection 17.0:2031572712
         Priority: Medium
         Max Seg Size: 1460
         Adaptive Rate Limiting Statistics:
            Min Rate: 240000 kbps
            Max Rate: 800000 kbps
            Cur Rate: 240000 kbps
            Soft Limit: 240000 kbps

-------------------------------------------
      TCP Connection 17.0:2031522168
         Priority: High
         Max Seg Size: 1460
         Adaptive Rate Limiting Statistics:
            Min Rate: 400000 kbps
            Max Rate: 800000 kbps
            Cur Rate: 400000 kbps
            Soft Limit: 400000 kbps

As you can see each of the TCP connections has a different Min & Cur rate levels. This is not related to the physical connection of the Ethernet port but it is an artificial limitation on the TCP flow control mechanism.

There are two things you need to avoid in FCIP tunnels and that is out-of-order delivery and slow-starts. These two have the most profound impact on overall performance. Not only on the FCIP itself but these issues will most certainly propagate to the FC side and thus will affect storage IO performance.

These counters are kept per circuit and TCP connection.

Circuit ID: 17.0
      Circuit Num: 0
      Admin Status: Enabled
      Oper Status: Up
........

Performance Statistics - Priority: F-Class
         Oper Status: Up
         Flow Ctrl State: Off
......
            TCP Stats:
            8270867868 Output Bytes
            89053582 Output Packets
            4686467216 Input Bytes
            89097293 Input Packets
            Retransmits: 40
            Round Trip Time: 0 ms
            Out Of Order: 31
                        Slow Starts: 14

Especially on F-class you need to make sure that these counters do not ramp up. You will get very unpredictable fabric behaviour.

Performance Statistics - Priority: Medium
         Oper Status: Up
         Flow Ctrl State: Off
.......
            TCP Stats:
            178287660238420 Output Bytes
            194072570849 Output Packets
            88349912897612 Input Bytes
            163107481894 Input Packets
            Retransmits: 6770
            Round Trip Time: 0 ms
            Out Of Order: 1065051
                        Slow Starts: 17

The counters you see above are examples and these are accumulated over the lifetime of the existence of the tunnel and circuit. If you suspect IP related issues you should first reset those counters.

As of FOS v7.1, FCIP Circuit and/or Tunnel stats and error counters can be cleared non-disruptively via CLI.

To Clear whole Tunnel stats along with all circuit stats plus Tunnel Uptime and Circuit Durations:

portshow fciptunnel <VE Port, or all> --reset --tcp --perf --circuits

To Clear individual Circuit stats plus Circuit Durations only:

portshow fcipcircuit <VE Port, or all> --reset <Circuit #> --tcp --perf

Tunnel and circuit rules and limitations

There are some rules and limitations on both the tunnel and circuit configurations.

First of all the maximum committed aggregate bandwidth should not exceed the bandwidth of the WAN link. Be aware that by default the commit rate on a 1Gb circuit is indeed 1Gb. If the actual WAN bandwidth is less then that you will see a lot of performance issues.

The minimum committed bandwidth cannot exceed the speed of the ethernet interface. So on a FX8-24 you cannot set the commit rate to 8000Mb on a 1G link.

The maximum commit rate cannot exceed 5x the minimum commit rate. Basically the ratio between min and max commit rate is thus 1:5.

The speed difference between the slowest and fasted circuit in a single tunnel should not exceed 4 x the lowest speed. So if your slowest circuit is 2Gb the fasted should not be over 8Gb.

Multiple circuits on one Ethernet interface.

Especially one higher speed interfaces like on the FX8-24 and 7840 switches you can have multiple IP addresses configured which each can create a separate circuit and tunnel to a different counterpart. The combined aggregate maximum commit rate of each of the circuits on that interface should not exceed the total speed capabilities of that interface. so if you have 4 circuits on a 10Ge interface you can configure 2G/2G/3G and 3G but not 2G/1G/5G/3G.

Compression

One the most heated discussions when it comes to tunnel configuration is very often on the topic of compression. It is often perceived that when compression is set to “Aggressive” the result should be that performance should improve. In many instances the opposite is actually happening. Due to the fact that when “Aggressive” compression is configured the switch will redirect all traffic to a different chip which will then use a software compression method. Although you may achieve a higher compression ratio, 99 out of a 100 times the processing capabilities of that chip is causing a bottleneck and therefore hit a ceiling and starts flat-lining way below the actual WAN capability. In almost ALL cases using standard hardware compression achieves the same compression ratio (2:1) and the overall throughput is much better. With the new 7840 there have been some changes in the architecture but I’ll get back to that in a later post.

Some very good references for FCIP design are described in Josh Judd’s book “[amazon template=quick link&asin=0741423065]”.

Adaptive Rate Limiting

One of the configuration pitfalls is the bandwidth settings for ARL. As mentioned above even though you have a 1Gb/s, 10Gb/s or 40Gb/s Ethernet interface uplink, this doesn’t mean the end-to-end link has that capability. To just rely on TCP adjusting window size and therefore flow performance would not be a good option. ARL requires a license called the “Advanced Extension license”. This allows you to configure upper and lower boundaries where TCP can do it’s job w.r.t. adjusting windows scaling. This massively improves performance in error scenarios and full bandwidth utilization is much better in the end. I highly recommend buying this license.

QoS

In almost all cases the link between two of your sites is leased from a telco or cable provider. This means in many occasions you don;t have end-to-end control on what is happening on the circuits of that WAN provider. Even though you have leased a 1Gb/s link you will find very often this is not 100% guaranteed especially when the WAN provider has numerous clients utilizing the same infrastructure. You may want to investigate the option of QoS and ask if the WAN provider offers QoS services with either DSCP or L2QOS on L3 or L2 OSI respectively. This can provide you with a better and stable link between your sites.

Hope this helps.

Regards,

Erwin