Searching for the root cause of a network problem may be the first step to solving it, but it is often the most time consuming one. NME talks to industry experts to find out what adminstrators can do to locate and fix problems in the most efficient way.
Network administration is often viewed as an arcane art by IT Managers. Many organisations adopt an "If I don't touch it, nothing will go wrong" approach, meaning they are very underprepared when problems do occur. Because networks are complex, evolving and organic mechanisms, configurations regularly change and administrators are not always aware when weak points rear up.
Mohamed Jameeluddin, IT officer for the UAE's General Civil Aviation Authority (GCAA) is currently overseeing a radical restructuring of the GCAA's network infrastructure. The aviation authority has an allocated a substantial budget to rebuild its network in order to host a variety of e-services for its customers; Jameeluddin and his team are currently assessing a variety of network management products in order make their infrastructure as robust and fail proof as possible. Jameeluddin says this will be important for the future development of the network.
"This redesign will help us eliminate the existing bottlenecks which are known to us, if we don't address them now, the network and the services will grow over time and scalability and manageability will be an issue," says Jameeluddin.
He believes that the most time consuming aspect of network management is discovering where a network problem, should it occur, originates and his team are looking to make use of the latest management solution software in order to get a comprehensive network topology and reduce the time spent searching for weak links.
"In terms of network problems we cannot immediately say what is wrong, or predict it because of a certain server, so you have to look into everything, it could be just one small switch, which is causing all the problems," Jameeluddin says.
"If you have proper network management software, which can see the kind of traffic on the network, or the performance utilisation of a device, or the CPU utilisation of the server, then it would make a big difference. We should be able to visualise what is going on in the network and if anything goes wrong it should send a trigger, or an alert or alarm, so we can immediately know that the problem is because of this service, or process, on this machine," he says.
However, Mike Rogers technical director at European IT and communications provider Claranet, believes that before any network is deployed adminstrators need to adopt careful planning procedures in order to devise a network map that shows everything in your network, from computers to individual cables, power supplies to phone lines.
"Start with a pen and paper and draw out everything on your network. Then begin removing components from the map, and watch what happens to the rest of the network. When you can remove a point that completely disables the network, you've identified a single point of failure (SPOF)," says Rogers.
SPOFs are the weak spots in a network that need to be made more resilient. Often they can be difficult to spot, especially considering the amount of cabling and connections in today's offices. Identifying areas with multiple SPOFs - where they are in the same location or run from the same power circuit, for instance - is critical to preventing downtime."Using multiple technologies is paramount in increasing resilience," says Rogers.
"For example, if an office has one DSL line to a datacentre which is the SPOF, adding another DSL line along the same copper wires will hardly help when a builder puts a spade through the bunch, or a DSLAM goes out of action. Instead mixing delivery technologies and cable routes greatly reduces the chances of something going wrong with your connections."
Mohamed Hamedi, CEO of Sphere Networks, has taken significant steps with his company's network management software in order to provide administrators with as much information on the interrelationship of network devices as possible. The key for Hamedi when tackling network problems and SPOFs is the software discovery engine, which can enable the user to find each product on their network and locate the dependency point of the relationship i.e. how each device connects to each other.
"When managing a network you are only as good as the information you have. So you need to be able to monitor key variables, performance variables, the speed of the link and how is the device handling traffic," he says.
"The device is just an engine that takes data, does something with it, and then behaves how it has been designed to. So if you are pushing a lot of data into something, if the device is fast enough it can handle the data fast enough, or maybe it is going to be dropping data or making mistakes. So you have to monitor these things," says Hamedi.
In order to monitor the CPU utilisation of a specific device, Sphere's solutions uses what it terms ‘Active Monitor', which tests how fast a device is being pushed to its limit. Hamedi compares the network to a freeway, and the management software to the traffic system handling all the cars on the road. On a network when the data packet arrives the device opens it and responds. You can then look at the number of data packets that are being dropped, or you may encounter a thousand users who all want to access a certain resource.
"If any of the devices go down if I have any problem with VLAN or sub-netting for the switches, or even with the servers, or if we have any problem with the speed of the processor or the memory, then we can look at information on how many threads we have or how many process working on the server, then we can handle this information and analyse it," says Bassel Kh, software engineer on Sphere's Active Monitor solution.
However, Hamedi wants to take things a step further. By creating an early warning parameter that can detect if a user defined symptom occurs on a network, his management software can take a specific action to minimise the impact of a problem should it occur, such as controlling the number of sessions into the server, or running another script to activate another device in order to create policies to control the problem.
"We can create symptoms. So I can say on my router I can only support for instance 10,000 sessions per second, and here my switch can handle 100Mb per second. We can actually create symptoms that say this fault is made out of three factors, and if these three things occur we will lose connectivity to our website. So then I can create an early warning parameter, which says if 50% of these symptoms occur, you have got to do something about it," says Yassine Bensaid engineer on Sphere's fault management software.
However, in order to act quickly and efficiently once a problem becomes apparent, administrators need an intimate knowledge of their network - each component, where it is sourced from and how to replace it. Fixing network problems in real time is almost impossible, according to Mike Rogers, so during downtime staff need to know how long an individual part will take to repair or replace, meaning network administrators have to make informed decisions on their chosen components, from purchasing to using them.
"Which components have the longest fix time? In financial terms, is it better to wait for a technician to fix it, or just to replace it? Is the part from a shop down the road or must an engineer come from Europe? These become major issues during downtime, so in-depth knowledge of the network is paramount," says Rogers.
Network managers also have to plan for the unexpected and take unconventional routes around problems. Network outages are often caused by unforseen or uncontrollable events and the administrators must have planned to mitigate the effects. Rogers believes that while IT managers are excellent at spotting the obvious weak spots such as cable breaks, hardware malfunctions or power loss, they also need to think about the unexpected.
"What happens to an SME if their only network engineer is off work for a month? What happens if we have an abnormally hot summer and equipment is damaged? What if a supplier goes out of business?" he says.
Jameeluddin argues that it is also imperative for employee policies to be in place in order to reduce the risk of network efficiency being reduced through careless or malicious use, which are frequently the source of many network administrator woes.
The wide range of available products that prevent intrusions from viruses or malware have been able to insulate most networks from external threats fairly comprehensively. However, Jameeluddin points out that no matter how good a network management solution is it will not stop an employee from needlessly clogging bandwidth with unauthorised internet activity or by carrying out a targeted internal network attack.
"Internet access policies need to be there, network access policies need to be in place too. We need network account maintenance policies and passwords. If there are no policies you are lost. You have your network up and running but you wont have control over it".
As networks grow more complex, being able to test for and diagnose faults - not to mention fixing them in a timely fashion - will become ever more critical for enterprises in the Middle East.For all the latest tech news from the UAE and Gulf countries, follow us on Twitter and Linkedin, like us on Facebook and subscribe to our YouTube page, which is updated daily.