I had an interesting case the week. According to the customer there had been some maintenance activities on their cabling infrastructure and shortly thereafter the ISL's would come up but there was absolutely no way these two would form a trunk. First thought is what happened in the configuration, DWDM/CDWM changes, switch configuration modified etc etc. The customer had a passive CWDM solution in place with just optical splitters so no TDM devices or any other interference on the FC link layer. The switch configuration was also correct on both sides and we had confirmation that the link length was absolutely the same. I went on checking if there was anything on the physical layer and when I looked at the SFP output and something baffled me.
Prt Sp LW Vendor W-Len RX-Pwr TX-Pwr
46 8G 25.5 km SmartOptics 1470 -10.7 2.9
47 8G 25.5 km SmartOptics 1490 -4.3 2.8
46 8G 25.5 km SmartOptics 1470 -3.1 3.0
47 8G 25.5 km SmartOptics 1490 -10.8 2.1
So this output showed some discrepancies in db drop-off values. The switch 1 tx-side of port 46 had a db value of 2.9 and that signal came in with a value of -3.1db. Port 47 of switch 1 sent the signal out with a db value of 2.8 but that one dropped of to a value of -10.8db. The other way around for port 47 was the same. This led me to believe there is either a very bad link or a very long link and something has been cabled incorrectly.
It almost looked like something was cabled this way:
whereby the link between the CWDM equipment had a long and a short line.
Now this in itself should not be a reason for the trunking problem since both links observed the same issue and thus the same length. So this required additional digging which led me to the fabriclog. (Very useful piece of info.) Normally when a port comes up as E-port it sends out an EMT (Exchange Mark Timestamp). The remote should send an ACC (Accept) and when this arrive at the originator you have a good indication of the round-trip-time.
Switch 1 port 47 sent the EMT at
00:36:55.031305 *EMT Send D2,I0 D2,I0 47 0x39ed
which arrived at switch 2 on
11:16:30.075425 *EMT Rcv F2,P2 F2,T0 47 0x39ed
The ACC got send at
11:16:30.076477 EMT Snd ACC F2,T0 F2,T0 47 0x39ed
which arrived on switch 1 at
00:36:55.070439 *EMT ACC Rcv D2,I0 D2,I1 47 0x39ed
This completed the exchange 0x39ed and this took 0.039134 seconds to complete.
On port 46 the result was different:
|00:36:55.641044 *EMT Send D2,I0 D2,I0 46 0x39fd||11:16:30.683121 *EMT Rcv A0,P2 A0,T0 46 0x39fd|
|00:36:55.655352 *EMT ACC Rcv D2,I0 D2,I1 46 0x39fd||11:16:30.683903 EMT Snd ACC A0,T0 A0,T0 46 0x39fd|
Then I looked at the differences between the two ports:
Port 47 : 55.070439 - 55.031305 = 0.039134
Port 46: 55.655352 - 55.641044 = 0.014308
So the timing difference was 0.024826 second which (when you do the speed of light maths) translates to around a 5KM cable length difference.
This is obviously too much for trunking to work. I've advised the customer to review the inter-site cabling infrastructure. Results to follow. 🙂
Erwin van Londen