Broadcast storm or how to win "the small scoundrels" in your network. Part 1 Part 2
Possible, for many people Broadcast Storm is something out area of fiction, as well unrealistic as the existence of aliens and hacking your network. But, at least hack and storm - is absolutely realistic, and it may happen at any time, even in the worst possible time, in your opinion :)
So, how work the switch when all is well: the switch builds a table of mac-addresses where each address corresponds to the port, and the customer traffic has been going not to any ports, like old hubs, but in specific port following the table of mac-addresses. Before that, when the switch learned the customer port, the first traffic sending going to all ports, after which the client send reply, and the switch put mac-address of client to table of mac-addresses. After that, switch communicates with the client via a specific port.
But one day, an inattentive employee creates a loop in your network, or there is an network attack, or any crafty equipment failure, and everything become is bad... The switch sends the packet, but instead of an answer, packet comes back in a few copies, and the switch not knowing the correspondence between the mac-address and port, solicitously sends packets back to all ports and again gets many copies of the packet back... Packages reproduce very quickly, traffic is growing like an avalanche, ports overloaded, trunks fall, cpus overloaded, SpanningTree collapses, up to that moment trying to block everything, and avalanche smashes into adjacent segments and drowns them. Your network no longer exists...
In this situation, not many variants, because your network are paralyzed, to monitor and search the source of storm is simply nowhere, because there is no network :) I see one of the variants as successively to shutdown the network segments or conversely: shutdown of all network segments, and successively connecting them, to detect which segment was the source of the chaos. Search in the segments, after that in links, ports - very sad and very long time for solution of the problem. Downtime can be just awful! So, us necessary to preparing for possible malfunctions, to minimize the harm from them!
To begin, you should never switch off Spanning Tree on the equipment! if it possible, you may tune options of the protocol in accordance with the topology of the network. If you do not have experience of STP and you have a small network - leave the default settings, STP cope with almost all problems! Why "almost"? Because, STP can not cope with loop, which will be on unmanaged hardware without STP support...
It would seem that can overload or even paralize such a network:
10 Gigabits links between switches, 1 Gigabit links to end-users + configured and tuned MSTP. No problems with any loops on the main equipment of the network. MSTP will block any double-links or loops in the main equipment. Moreover, all of the switches have double-links between themselves for fault tolerance, and successfully working through MSTP. I will not be consider possible attacks and malfunctions. I will be consider the option, if your network has a "stupid" unmanaged switch like D-Link... and employee made a loop on it ... And STP and MSTP will be powerless in this situation :) This "the small scoundrel" will bring down your network in few of minutes or even few tens of seconds :)
I seems to me, to help you protect yourself from unmanaged equipment at 100% there is only one way - ban for use it in your network. But if the ban and "throw in the trash" impossible for you for some reason, then you should configure equipment for the fight against the storm.
Modern switches have a tools such as the storm-control, broadcast limit and perhaps some variations of this same functions on the equipment of different manufacturers.
Setting broadcast-limit to 10% on HP and storm-control broadcast level 10% on Cisco - my network is alive by 80% with loops on the "stupid" equipment. The only thing that bothers me is that STP continues be naughty. Still are blocking wrong ports, and the storm from blocked port continues to storm anyway, though not much. Perhaps this is because the transmitted BPDU continues when the port locked by STP, and after blocking the initiator, broadcast storm changes to BPDU storm.
Here's an example:
Loop on the port A5, port immediately blocked, and folowed by the port L2 is also blocked. It does not paralyze the network, because there are additional links, but still not very nice, because in this way it can block an important port. Some problems still remain,, but thanks to "broadcast limit", the storm to very weak and I can already say for sure that the limits on Broadcast traffic are helping! Not for 100%, but very well help :) Next, we will configure the MSTP, but more on that next time :)
So, how work the switch when all is well: the switch builds a table of mac-addresses where each address corresponds to the port, and the customer traffic has been going not to any ports, like old hubs, but in specific port following the table of mac-addresses. Before that, when the switch learned the customer port, the first traffic sending going to all ports, after which the client send reply, and the switch put mac-address of client to table of mac-addresses. After that, switch communicates with the client via a specific port.
But one day, an inattentive employee creates a loop in your network, or there is an network attack, or any crafty equipment failure, and everything become is bad... The switch sends the packet, but instead of an answer, packet comes back in a few copies, and the switch not knowing the correspondence between the mac-address and port, solicitously sends packets back to all ports and again gets many copies of the packet back... Packages reproduce very quickly, traffic is growing like an avalanche, ports overloaded, trunks fall, cpus overloaded, SpanningTree collapses, up to that moment trying to block everything, and avalanche smashes into adjacent segments and drowns them. Your network no longer exists...
In this situation, not many variants, because your network are paralyzed, to monitor and search the source of storm is simply nowhere, because there is no network :) I see one of the variants as successively to shutdown the network segments or conversely: shutdown of all network segments, and successively connecting them, to detect which segment was the source of the chaos. Search in the segments, after that in links, ports - very sad and very long time for solution of the problem. Downtime can be just awful! So, us necessary to preparing for possible malfunctions, to minimize the harm from them!
To begin, you should never switch off Spanning Tree on the equipment! if it possible, you may tune options of the protocol in accordance with the topology of the network. If you do not have experience of STP and you have a small network - leave the default settings, STP cope with almost all problems! Why "almost"? Because, STP can not cope with loop, which will be on unmanaged hardware without STP support...
It would seem that can overload or even paralize such a network:
10 Gigabits links between switches, 1 Gigabit links to end-users + configured and tuned MSTP. No problems with any loops on the main equipment of the network. MSTP will block any double-links or loops in the main equipment. Moreover, all of the switches have double-links between themselves for fault tolerance, and successfully working through MSTP. I will not be consider possible attacks and malfunctions. I will be consider the option, if your network has a "stupid" unmanaged switch like D-Link... and employee made a loop on it ... And STP and MSTP will be powerless in this situation :) This "the small scoundrel" will bring down your network in few of minutes or even few tens of seconds :)
I seems to me, to help you protect yourself from unmanaged equipment at 100% there is only one way - ban for use it in your network. But if the ban and "throw in the trash" impossible for you for some reason, then you should configure equipment for the fight against the storm.
Modern switches have a tools such as the storm-control, broadcast limit and perhaps some variations of this same functions on the equipment of different manufacturers.
Setting broadcast-limit to 10% on HP and storm-control broadcast level 10% on Cisco - my network is alive by 80% with loops on the "stupid" equipment. The only thing that bothers me is that STP continues be naughty. Still are blocking wrong ports, and the storm from blocked port continues to storm anyway, though not much. Perhaps this is because the transmitted BPDU continues when the port locked by STP, and after blocking the initiator, broadcast storm changes to BPDU storm.
Here's an example:
Loop on the port A5, port immediately blocked, and folowed by the port L2 is also blocked. It does not paralyze the network, because there are additional links, but still not very nice, because in this way it can block an important port. Some problems still remain,, but thanks to "broadcast limit", the storm to very weak and I can already say for sure that the limits on Broadcast traffic are helping! Not for 100%, but very well help :) Next, we will configure the MSTP, but more on that next time :)
Here is an example of ping, when the loop in the D-Link activated. Without these settings ping did not works. So, it is pre-victory :)
No comments:
Post a Comment