On 21st March 2007 at 20:27 there was an unscheduled outage which affected some routes into our network.
This occured during a scheduled maintenance window on a small section of our network. The small number of clients that this would have affected were informed well in advance as per our usual proceedures.
The course of events that led to the network fault were as follows:
1) We took a layer 3 OSPF Link down as part of the scheduled maintenance window on a small section of our network. This change should have automatically routed around the rest of the network.
2) Problems were observed with some routes by the onsite network team and these were investigated.
3) The link that was taken down was brought back up in order to return to the original configuration.
4) All network devices were investigated for errors. All links were showing as healthy.
5) The Network Team discovered that the CEF table in a core router had entries that had not been properly updated automatically during the course of taking the OSPF Link down.
6) A Layer 2 interface was taken down because of the corrupt CEF tables which were then forced to be cleared. This resolved the problem and all Network Diagnostics show a clear bill of health for routes in and out.
This was resolved at 21:14
This lasted 47 minutes.
After the situation was resolved the Network Team investigated what would have caused the CEF table to have maintained old entries. It is believed at this stage that it was as a result of a Cisco Software Bug as there was no hardware or link failure or error. We will inform Cisco about this immediately and investigate further in the morning during working hours.
Date: 21st March 2007
Time: 20:27
Duration: 47 minutes
We would like to express our sincere apologies for the inconvenience this will have caused and assure all clients that we will be pursuing and seeking a resolution from Cisco regarding the error.
We would like to apologise for the length of downtime. This error was very unusual, in that there was no hardware or network link failure, and the configuration of all network devices was correct. Our network staff checked these aspects of the network in this order. The problem was actually caused by an undocumented bug in the Cisco IOS software, which prevented the routers forwarding tables from being cleared automatically by a layer 3 routing adjustment. This should happen automatically and transparently. The unique nature of this error was why it took quite some time for our staff to track down and put a workaround in place to resolve the error.
Regards,
The RapidSwitch Team