Free! Registration is required.
Dear Doctor,
One interface on our System i box connects to an Ethernet switch that connects through a fiber-optic transceiver (FOT) and cable that runs 3,000 feet to our training center, where another FOT turns the signal back into Cat-5e that plugs in to another Ethernet switch. We sometimes need to unplug PCs from our main building and take them to the training center for special classes. Often when we do this, the PCs won't communicate to our System i server over the fiber, yet they can connect to other Ethernet devices in the training center. I've tried rebooting the switches at either end (and even rebooting i5/OS) with no change. Oddly enough, computers left in the training center overnight always work fine the next day. What gives?
Gentle User,
Your timing with this question is perfect, as many organizations are expanding their enterprise LANs with fiber cabling and encountering similar problems. The problem itself is one of ARP cache timing — as Dr. I Doctor will explain in a moment. But first, here is an immediate solution to your problem: power cycle the FOTs at each end of the fiber link. This may seem like voodoo, but trust your Doctor, it will work.
FOTs are often treated as passive connectors, especially in legacy networks where they're used to interconnect copper-based Ethernet switches. You naturally focus on the switches as the active components and think of the FOTs as just another kind of adapter. But a FOT is, in fact, a full-fledged Ethernet bridge. Like any bridge, it switches packets between interfaces based on Ethernet hardware (MAC) addresses it learns from the devices plugged in at each end. Two FOTs also exchange bridge information between themselves. It's a complicated process, involving the spanning tree protocol and, if you're not careful, can cause the symptoms you're seeing.
The FOT in your main office learns the MAC addresses for all the PCs connected to your main office switch, retaining them for some preconfigured timeout interval, which could be from five minutes to five hours. In your case, the timeout appears to be on the long side. If a PC moves to the other end of the fiber link, the FOT at that location may refuse to learn the PCs MAC address because it is already in its table (obtained from the main-office FOT). The FOT does this to avoid a potential bridge loop.
Power cycling the FOTs clears their MAC address tables, letting each side learn the current location of all PCs, thus solving the problem. Some sophisticated FOTs are programmable, letting you configure the MAC address timeout. If yours isn't, your only alternative is to power cycle the FOTs or replace your existing switches with fiber-capable switches, eliminating the FOTs.
Dear Doctor,
Our primary T1 Internet provider has frequent outages, so we're in the process of adding a second upstream Internet provider and will be running the Border Gateway Protocol (BGP) to provide automatic failover between the two providers. I have acquired the necessary Class C IP address space and the Autonomous System Number (ASN), and I have already set up BGP with our existing provider. Our new provider's circuit won't be installed for several weeks, but everything is configured on our router so that we can go live immediately. Alas, since enabling BGP, whenever our existing provider has an outage, we lose Internet connectivity for up to an hour! Our provider says it's "route flapping." What is that and how do we fix it?
Gentle User,
Many smaller enterprises have been switching to BGP to gain the added network resilience it offers. However, because BGP manipulates the core routing tables of the Internet, you must take care that it's implemented correctly. One important requirement is that BGP never be enabled unless you have at least two active upstream providers.
The purpose of BGP is to provide alternate routing for specific IP address blocks, such as the /24 block you've obtained. Should connectivity through one provider fail, the Internet will automatically route traffic to the other provider. In your case, because there is no other provider yet, the BGP route convergence process flips back and forth between several possible destination routes before finally determining that your network is completely offline. This route "flapping" can be dangerous to the Internet as a whole if too many destinations are unreachable, so the BGP imposes a protection mechanism called "route flapping penalties."
These penalties cause the Internet to ignore route changes for your IP address space for a while — the time can vary from a few minutes to several hours. So when your network does come back online, the Internet ignores this event until the accumulated penalty time for your /24 expires. The fix is to go back to static IP routing with your original ISP until your new ISP is fully installed and operational.
Send your Internet questions to Dr. I Doctor editor Mel Beckman via e-mail to mbeckman@DrIDoctor.com.