Sunday, December 27, 2015

Building a fault tolerant network infrastructure for your company.


Part 1: Spanning Tree Protocol. MSTP on HP ProCurve


     In the previous articles we solved the problems created by unmanaged network hardware. In this article we begin a series for "novice professionals", build fault-tolerant network based on expensive equipment with more good features :)

     There is a task to create a network in which is no single point of failure. All commutators are connected to at least two links to two other commutators, and all the servers are connected to two commutators. At the moment the scheme is not contains channels to the internet, VPN channels and telephony, redundancy of the channels will be considered in the future. To complicate the our scheme, we have in the network equipment of different brand names. It good equipment, but different, is not fully compatible for some protocols. It will be Cisco and HP ProCurve. Sometimes it happens: ready for Cisco 6000, then communicate to you, that will be Huawei instead of Cisco, and get as result HP ProCurve... ;)
     Using equipment from different brands in the network - not the best idea! We must avoid a zoo in network equipment. But if it happened, you have be able to configure :) In fact, our scheme does not hide the big problems.. First, the ProCurve is good equipment, and all the basic LAN entirely on HP, and Cisco commutators only perform additional functions, and will not perform key tasks. We should not have many problems during the project.
     Of course, I devoted fan of Cisco. It is known that who love VTP, HSRP and EIGRP, who with tears and pain uses GVRP, VRRP and OSPF :)

This is our network scheme:
































     Cisco equipment are not young, but Gigabit. HP ProCurve is new and modern, all with links of 10G, and HP will be the main core of the network. Cisco use as a additional equipment, for various servers that do not generate a lot of traffic. Cisco and HP also connected through multiple trunks, though without excess.
  1. We collect, assemble, connect, turn on, make the basic settings!
  2. All additional links between commutators must be physically disabled, until so long we not set up Spanning Tree. Otherwise many loops will kill our network.
  3. Configure GVRP, and sadness without beautiful VTP :)
  4. Permission is granted an identical list of VLAN's on all tanks between commutators! The identity is necessary that the STP would not paralize our network by prohibiting VLAN on permitted for this VLAN trunks, and allowing on prohibited trunks.
  5. Configure virtual IP VRRP on HP1 and HP2, for users VLAN's on the different floors. The default gateway for users must be alive always, if at least one of these HP1 and HP2 is working. Oh, where are you, HSRP ?:)
     Configure Spanning Tree on HP ProCurve.

     We will configure MSTP, because it is the most logical choice for a large network with a large number of VLAN. In MSTP VLAN's can be divided per "instances", and how many instances you describe, the same number of processes will be. In PVST the number of processes is equal to the number VLAN's in your network. Memory will spend the more, than more you have a VLAN. For example, in the case of 100 VLAN's, you will have 100 processes - it's horror.

     Although we configure a ProCurve, but will use the recommendations to configure MSTP from Cisco. Cisco recommends that you use the same "region" in all your network, the minimum number of instances, and set up the priorities for the "root bridge". If you do not know what is "region", "root bridge", and "instances" it is necessary to first read this: Wikipedia: Spanning Tree Protocol - This will help you understand everything :)
     Also highly recommend advance to divide the entire possible range of a VLAN on "instances"! Because each following change "instance" later (when you have a working network), will lead to a recalculation topology of the network and unpleasant stop network work. And if "instances" become different on the commutators, all in general will fall, and topology recalculated into a single "instance" 0 to communicate between the commutators with do not match settings.

     So, start! Create two instance, given the fact that the main links between commutators in pairs, and the number of instances more than the number of links it makes no sense. We confirm that the spanning tree is disabled and we start the configuration:

Configure root HP1:
Set up the name of the region MSTP. It must be the same throughout the network:
spanning-tree config-name "H2SO4"
Config revision number must be the same in all network:
spanning-tree config-revision 1
I divide all VLAN on two "instances" according to the load on them in my network. Distributing the load evenly between instances:
spanning-tree instance 1 vlan 1-35 101 111-500 1001-4094
spanning-tree instance 2 vlan 36-100 102-110 501-1000
This commutator will be as root for instance 1:
spanning-tree instance 1 root primary
In general priority of the commutator, this root bridge in spanning tree region:
spanning-tree priority 1

Configure root HP2:
spanning-tree config-name "H2SO4"
spanning-tree config-revision 1
spanning-tree instance 1 vlan 1-35 101 111-500 1001-4094
spanning-tree instance 2 vlan 36-100 102-110 501-1000
Everything here is the same as for HP1, but it has root for instance 2, and the priority of the root bridge in the region lower: 2
spanning-tree instance 2 root primary
spanning-tree priority 2

Configuring the commutators on the floors HP Ax:
spanning-tree config-name "H2SO4"
spanning-tree config-revision 1
spanning-tree instance 1 vlan 1-35 101 111-500 1001-4094
spanning-tree instance 2 vlan 36-100 102-110 501-1000
There must be the same configuration as for HP1 and HP2, but do not set any priorities. This commutators for users on the floors, it are a low priority.

Enable spanning tree on all Commutators:
spanning-tree enable

After entering this command, you lose the connection to the commutator until miscalculated Spanning Tree topology and all ports are enabled. After that we can connect additional links between commutators and wait for activation :)

   All setting on HP ProCurve is completed! We can now see the statistics in the console of the commutators :)

So, we look at the HP1:
sh spanning-tree
Multiple Spanning Tree (MST) Information

  STP Enabled   : Yes
  Force Version : MSTP-operation
  IST Mapped VLANs : 1025-4094
  Switch MAC Address : 001871-b6a000
  Switch Priority    : 32768
  Max Age  : 20
  Max Hops : 20
  Forward Delay : 15

  Topology Change Count  : 9
  Time Since Last Change : 87 secs

  CST Root MAC Address : 001871-b6a000
  CST Root Priority    : 32768
  CST Root Path Cost   : 0
  CST Root Port        : This switch is root

  IST Regional Root MAC Address : 001871-b6a000
  IST Regional Root Priority    : 32768
  IST Regional Root Path Cost   : 0
  IST Remaining Hops            : 20

HP1, as prescribed for it, became a root in the MSTP region.

 sh spanning-tree instance 1

  E1    10GbE-SR   2000      128      Designated Forwarding   001b3f-c1a800
  E2    10GbE-SR   2000      128      Designated Forwarding   001b3f-c1a800
  E3    10GbE-SR   2000      128      Designated Forwarding   001b3f-c1a800
  E4               Auto      128      Disabled   Disabled
  F1    10GbE-SR   2000      128      Designated Forwarding   001b3f-c1a800
  F2    10GbE-SR   2000      128      Designated Forwarding   001b3f-c1a800
  F3               Auto      128      Disabled   Disabled
  F4               Auto      128      Disabled   Disabled

sh spanning-tree instance 2

  E1    10GbE-SR   2000      128      Alternate  Blocking     001b3f-582100
  E2    10GbE-SR   2000      128      Alternate  Blocking     001b3f-57c800
  E3    10GbE-SR   2000      128      Alternate  Blocking     0019bb-11ac00
  E4               Auto      128      Disabled   Disabled
  F1    10GbE-SR   2000      128      Alternate  Blocking     0019bb-0e2b00
  F2    10GbE-SR   2000      128      Root       Forwarding   001871-b6a000
  F3               Auto      128      Disabled   Disabled
  F4               Auto      128      Disabled   Disabled

In instance 2, all ways to the floors commutators labeled as alternative and closed.

Look at HP Ax located on the floors:

sh spanning-tree ins 1

 L1    10GbE-SR   2000      128      Root       Forwarding   001b3f-c1a800
 L2    10GbE-SR   2000      128      Alternate  Blocking     001871-b6a000

sh spanning-tree ins 2

  L1    10GbE-SR   2000      128      Designated Forwarding   001b3f-582100
  L2    10GbE-SR   2000      128      Root       Forwarding   001871-b6a000

Result: instance 2 is blocked by HP1 and comes to commutators located on the floors from HP2. Commutators located on the floors block instance 1 towards HP2, and receive it from HP1. The load is distributed across the two links, which we actually wanted :) And need second trunk between HP1 and HP2 :)

Our scheme. Instances 1 is marked as blue, instance 2 marked as red:



















     Well, MSTP between the HP we raised, the next time we try to connect Cisco Catalyst to this scheme :)

Saturday, December 19, 2015

Broadcast storm or how to win "the small scoundrels" in your network. Part 2 Part 1


     So, we continue to win the storm. Today, I will share the experience of using a dedicated loop-protection on Cisco and HP ProCurve.
     Setting broadcast-limit on HP and storm-control with Cisco have helped, the situation in the network becomes much better, but not completely normalized. At the Cisco and HP have specialized tools to battle with loops, including loops to "unmanaged" equipment.

     HP ProCurve.

     On ProCurve this function is named a loop-protect, and works on a simple and reliable technology: send broadcast packet to the port, and if it returned, it means that on this port the "loop". Commutator sending broadcast packet to all ports excluding port from which it received the packet. If your network has the correct topology, packet can't return back. It is simple and reliable protocol for detection and blocking loops, even if the port is connected to a whole garland of unmanaged switches.
     Configuration of this feature is very simple:
loop-protect A1-A24,B1-B24
A1-A24, B1-B24 - it ports range for apply the settings. Also this function has additional parameters for describe actions if detected loop:
... receiver-action send-disable - block the port and no other actions,
... receiver-action send-recv-dis - block and try to recover after some time.

     Example: "loop-protect A1-A24, B1-B24 receiver-action send-disable" - block the port if detected loop and do not restore it automatically. Maybe automatic port recovery is not necessary, because at first need to find and remove the loop, and only after removing, enable the port: int A17 enable
     You can also specify global settings for loop-protect, define timers and etc:

loop-protect disable-timer - how much time after block the port to try to restore the port,
loop-protect mode port / vlan - to work with the ports or vlan's ?
loop-protect transmit-interval 1-10 - the time interval between sending the "detecting packets" to port,
trap, vlan и PORT-LIST - it is clear without explanation :)

     Today I tested the loop-protect technology - works perfectly! Port which connected to D-Link with loop on the two ports, almost immediately blocked by ProCurve:



     Cisco.

     Spanning-tree loopguard - It will not help us. This technology is based on "loss of BPDU communication", it for links between commutators, and not helps from loops on unmanaged network hardware.

     UDLD:
Or global enable for all intefaces:
   conf t
      udld enable

Or on each interface separately:
   conf t
      interface GigabitEthernet0/1
         udld port enable

     This technology works well, but interval longer than the HP, 15 seconds against 5 at HP, and is not configurable. For this reason, network can a little shake before blocking the port. Also UDLD has two modes of operation: normal (just enable) and aggressive, but for our small tasks aggressive mode do not need, so we will not review aggressive mode :)

     In general, good idea set BPDU Guard on client access port Cisco. BPDU Guard will block the port if incoming BPDU has been seen, for example in the case of a simple loop. In general, this is the fine tuning of STP, in the some following articles I will describe STP in more detail. BPDU Guard, and BPDU Filter, and much more i can tell :)

     For the tests I used the HP ProCurve 5412zl and Cisco 2960G. As an unmanaged "the small scoundrel" i used the 8-port unmanaged D-Link with a loop in its ports.

Friday, December 18, 2015

UnixDaemonReloader - restarting daemons after modification configuration files!
(Update 2016.01.03)



    The series "my crafts" or "re-invent the bicycle" is still continues ;)

Recently, I began creating clusters for all services of the company. For such services as mail, proxy, VoIP and etc. It needs for load balancing and fault tolerance, and it is useful for my experience and knowledge, and also for my enterprise :) I had the task to sync configuration files between cluster nodes, and also to react for changes of configuration files... For example, on the master host changed the config of postfix mail server, file replicate to the second node of the cluster, so, what is next ? We must somehow restart postfix, or rather it reload. And so, i created some scenarios of syncronization and compare files for finding differences, i wrote many scripts and methods, but there was not a unified schema. One morning I woke up and decided to make a program that work as daemon,  rereading own configuration before each cycle of work, and performing prescribed actions when the specified files changed. No sooner said than done! And the name for this service - UnixDaemonReloader :)

Configuration file:











     The configuration file is very simple. Path to Unix shell and the parameter allows you to execute an external command, like so: /bin/sh -c "ps ax". Then, specify a list of strings to track files and directories:

["/directory", "file", "action", "pre-app script", "result of pre-app script","error script"],
["/directory", "mask*of*the*files*", "action", "", "",""],
["/directory", "!all*files*except*this,!except*this,!and*except*this", "action"]


Update from 2016.01.03:
     Added parameters "pre-app script", "result of pre-app script" and "error script" into listing of files for watching. "Pre-app" script must to return result to stdout. Example for return: "OK" :) If returned value equals value from configuration file, will running script for restarting or reloading service, else after ending amount of attempts to execute "pre-app", will running "error script". See README.md for study new syntax of WatchList.
     PS: You can add into "pre-app script" syntax check and backup configuration file. "Error script" may contain sending E-Mail or SMS and restore from backup copy of the config.
     Added parameter UDR_ScriptsPath, pointing the path to pre-app scripts.
     Added parameter UDR_PreAppAttempt, indicating the number of executing times the "pre-app" script, after that execute "error-script" or stop attempts.
     Fixed restart for all services after first initialize database of files

Update from 2015.12.23: 
UDR_PauseBefore - pause before running the script (seconds). This setting for save your daemons from "your hands". If you during editing configuration file, accidentally press "save a file" with error or unfinished, then you have time to correct the error before the daemon will be restarted.
UDR_ScriptsPath - path to "pre-app" scripts
UDR_PreAppAttempt - Number of attempts to try execute "pre-app" scripts.
UDR_PauseBefore - pause before running the script (seconds). This setting for save your daemons from "your hands". If you during editing configuration file, accidentally press "save a file" with error or unfinished, then you have time to correct the error before the daemon will be restarted.
Sleep_Time - How much time to sleep between checks files,
SQLite_DB - SQLite_DB - the way to the base SQLite, which stores the checksums of files.

     Actions may be different, not necessarily the restart, reload, or "kill -HUP". For example, can send the message to administrator :)
     This program not only for clusters, but for all other systems. This helper that will save you from manual restarting a services with frequently changing configs :)

     The program is written on Go language, It can be compiled from source codes for Linux, BSD, Mac, Android and other.


     Source codes:
          UnixDaemonReloader Source Code

     Compilled binary files for FreeBSD and Linux:
          UnixDaemonReloader on SourceForge
          UnixDaemonReloader on My Google Drive

Saturday, December 12, 2015

Broadcast storm or how to win "the small scoundrels" in your network. Part 1 Part 2


     Possible, for many people Broadcast Storm is something out area of fiction, as well unrealistic as the existence of aliens and hacking your network. But, at least hack and storm - is absolutely realistic, and it may happen at any time, even in the worst possible time, in your opinion :)

     So, how work the switch when all is well: the switch builds a table of mac-addresses where each address corresponds to the port, and the customer traffic has been going not to any ports, like old hubs, but in specific port following the table of mac-addresses. Before that, when the switch learned the customer port, the first traffic sending going to all ports, after which the client send reply, and the switch put mac-address of client to table of mac-addresses. After that, switch communicates with the client via a specific port.
     But one day, an inattentive employee creates a loop in your network, or there is an network attack, or any crafty equipment failure, and everything become is bad... The switch sends the packet, but instead of an answer, packet comes back in a few copies, and the switch not knowing the correspondence between the mac-address and port, solicitously sends packets back to all ports and again gets many copies of the packet back... Packages reproduce very quickly, traffic is growing like an avalanche, ports overloaded, trunks fall, cpus overloaded, SpanningTree collapses, up to that moment trying to block everything, and avalanche smashes into adjacent segments and drowns them. Your network no longer exists...
     In this situation, not many variants, because your network are paralyzed, to monitor and search the source of storm is simply nowhere, because there is no network :) I see one of the variants as successively to shutdown the network segments or conversely: shutdown of all network segments, and successively connecting them, to detect which segment was the source of the chaos. Search in the segments, after that in links, ports - very sad and very long time for solution of the problem. Downtime can be just awful! So, us necessary to preparing for possible malfunctions, to minimize the harm from them!

     To begin, you should never switch off Spanning Tree on the equipment! if it possible, you may tune options of the protocol in accordance with the topology of the network. If you do not have experience of STP and you have a small network - leave the default settings, STP cope with almost all problems! Why "almost"? Because, STP can not cope with loop, which will be on unmanaged hardware without STP support...

     It would seem that can overload or even paralize such a network:






















     10 Gigabits links between switches, 1 Gigabit links to end-users + configured and tuned MSTP. No problems with any loops on the main equipment of the network. MSTP will block any double-links or loops in the main equipment. Moreover, all of the switches have double-links between themselves for fault tolerance, and successfully working through MSTP. I will not be consider possible attacks and malfunctions. I will be consider the option, if your network has a "stupid" unmanaged switch like D-Link... and employee made a loop on it ... And STP and MSTP will be powerless in this situation :) This "the small scoundrel" will bring down your network in few of minutes or even few tens of seconds :)
     I seems to me, to help you protect yourself from unmanaged equipment at 100% there is only one way - ban for use it in your network. But if the ban and "throw in the trash" impossible for you for some reason, then you should configure equipment for the fight against the storm.
     Modern switches have a tools such as the storm-control, broadcast limit and perhaps some variations of this same functions on the equipment of different manufacturers.
     Setting broadcast-limit to 10% on HP and storm-control broadcast level 10% on Cisco - my network is alive by 80% with loops on the "stupid" equipment. The only thing that bothers me is that STP continues be naughty. Still are blocking wrong ports, and the storm from blocked port continues to storm anyway, though not much. Perhaps this is because the transmitted BPDU continues when the port locked by  STP, and after blocking the initiator, broadcast storm changes to BPDU storm.
      Here's an example:


     Loop on the port A5, port immediately blocked, and folowed by the port L2 is also blocked. It does not paralyze the network, because there are additional links, but still not very nice, because in this way it can block an important port. Some problems still remain,, but thanks to "broadcast limit", the storm to very weak and I can already say for sure that the limits on Broadcast traffic are helping! Not for 100%, but very well help :) Next, we will configure the MSTP, but more on that next time :)

     Here is an example of ping, when the loop in the D-Link activated. Without these settings ping did not works. So, it is pre-victory :)


Thursday, December 10, 2015

Authentication in Active Directory through LDAP on Perl, php and python "with their hands" ... :)
     I think that not only I, at times, solving the problem, seen, that all found decisions of this problem, do not such, how me need :) In this situation i see three ways: spit and forget, try to improve found solution or do it yourself ;)
    In 2005, I was a system administrator with job experience, and with the programming inclinations. I had a task to configure user authentication of "squid-proxy server" in Active Directory. In general, i not found nothing better than enter samba to AD domain and authenticate users over my perl script via WinBind. This solution has worked without any problems for several years. My proxy server was configured to FreeBSD. SoftWare update on FreeBSD in 2005-2008, it was even worse than it is now... Update only through portupgrade, remove and rebuild all software and dependencies from source code - it's just a great nightmare! Errors in rebuilding of libraries, and a big chance to break the whole system forever. In general, easier to update the kernel and world for fixing security issues, and reinstall the system every 5 years. Of course software in FreeBSD Ports originally much fresher than any Linux, but software grow old, and samba with winbind including. One day there came a moment when the old winbind could not enter to domain controller running Windows 2008, and then I realized that from the samba with winbind have to go away... :)
     I searching for a best way, than samba, and I found the way to authenticate in Active Directory via LDAP ... Yeah, it is just, cross-platform, programming on any language. Example I found here is:  php.net ref.ldap.php almost what I need, but "another language, not Perl";) As a result, I modified the PHP code, i improved check group membership function, plus I have built protection from infinite loop if the group included into each other "in a circle". And rewrote on languages: perl and python, and recently on the GoLang... :)

     Scripts consist of three functions:
  1. Connect to LDAP server with the username and password of the user log. If the connection failed - Error. If success - check for need of checking group membership. If checking to need - go to step (2), if not - return the success.
  2. Search the DN for the user and for the group.
  3. Checking for membership of the user in the group. If there is  - "Success", and if not - "excuse me" :)
This is not the library, this is code examples. May be someone they can help ;)


A note about the code on GoLang in the next time :)

It's my first article in English :)