In this blog we introduce the concept of network monitoring as the key process and mechanism to ensure that your telecom or enterprise network is working properly, achieving high standards of reliability and availability. These are the key elements required to provide high quality communication and IT services to your external or internal customers.
What is Network Monitoring?
Monitoring is often related to the continuous and consistent observation of the performance and health of a process, system, or activity. It is implemented through actions of measurement and/or the collection of data, their processing, and representation in a human-readable form.
Network and IT monitoring is a form of monitoring that involves the acquisition and collection of network-specific data, aiming to ensure the good health and optimum performance of the monitored network.
Most people use a desktop computer, laptop, tablet, or smartphone to access the internet or company files and data. The data on the internet or a company’s data is stored on other computers called servers. When someone accesses data, it must move from those servers to their computer. This is achieved by using a communication network.
A communication network consists of network devices and lots of cables connecting them together. Within a communication network, network devices such as routers and switches are used to forward data from one device to another until the data reaches its destination. Cables are used to enable the transfer of data between two devices, for instance between a server and a network device or between two adjacent network devices.
Network devices differ individually in the way they decide how to route data through the network. There are network switches, network routers, and many other types of devices comprising the network.
Now, let’s investigate an example of a very simple corporate network as depicted in the figure below.
Diagram of a simple corporate communication network
Imagine you are at the workstation (laptop), and you have to retrieve data from your company’s server to do your job. So, for the data to be transferred from the company’s server to your workstation, data first “travels” from the server to the first network device that the server is connected to – “NETWORK SWITCH 2”. The data then continues its journey by traveling through other network devices until it finally reaches the workstation. The devices through which data travels are network devices.
Of course, other elements are also connected to the network such as your company’s data center and cloud infrastructure, but for now they are not important for our discussion, but they are important to your company’s IT.
Now imagine that a network device is unwittingly unplugged in the figure above. Now data can’t be retrieved from the company’s server because an intermediate network device is “down”. Data is flowing from the server, but when it hits the device that is no longer plugged in, the data has nowhere to go. You immediately call your IT to report the problem.
Diagram of a simple corporate communication network with an unplugged network switch
Well, what now? If the IT department fixes the problem fast, the business won’t suffer too much. But, if it takes too long, then employees, the IT department, and the company can’t provide the excellent customer service that they’ve developed a reputation for. The bottom line? The time it takes to repair the network depends upon the IT department.
Repair time to fix the network is determined by the sum of these two factors:
- Time needed to detect the problem, i.e., its nature and location
- Time needed to fix the problem once it’s been detected
In our example, repair time consists of detecting that the network device in room 1 of your company office is unresponsive, and then fixing the problem by arriving on location and plugging the network device back into the outlet.
Being fast in finding out which device is faulty, what the problem exactly is, the location of the device, and which person can complete the job in minimum time is the key to repairing the network fast. The only way an IT department can do this is by using a proper IT network monitoring system. In telecoms, the concepts do not differ significantly, and a telecom-grade network monitoring system is used for the same purpose.
How does monitoring work
Thank heavens that engineers who invented network equipment also equipped them with mechanisms to report or detect problems with their operation. Network devices, as well as all other information and communication technology (ICT) equipment, if configured properly, are constantly sending notifications (unsolicited data messages) to a central monitoring system about their status.
In our simple scenario, it’s not the network device that was unplugged that sends a notification (remember, it’s “down” so it can’t communicate). However, the network device that’s connected to the unplugged network device can report to the monitoring system that it has lost communication with it. This live device can easily detect such a situation because it has stopped receiving an essential signal from one end of the cable (the one connecting it to the unplugged switch).
Diagram of a simple corporate communication network that depicts how different network devices are sending notifications to the central monitoring system (dotted arrows).
The above figure depicts how different network devices are sending notifications to the central monitoring system (dotted arrows). The central router’s notification will state that there is a broken link. Additionally, devices connected to the unplugged switch (laptop and printer), and the unplugged switch itself are unable to send or receive data as the notifications are sent using the network itself.
The central monitoring system then reports this type of situation to IT experts in the form of alarms and they will then try to figure out what’s going on.
Types of Network Monitoring
Monitoring involves receiving notifications from devices, as discussed in the previous section. The common term for these notifications is an "event." An event may signify a routine occurrence in the network, such as a device activating a fan for cooling. Yet, events can also indicate irregular conditions, which are elevated to "alarms" or "alerts." Alarms signal a compromised health status of the monitored system.
While comprehending system health through alarms is vital, engineers are equally interested in gauging how effectively the system operates. Hence, the introduction of performance monitoring.
Performance monitoring entails continuously retrieving and processing performance data. For example, monitoring might involve fetching a device's temperature or counting transmitted bits. These performance metrics measure the system's performance level, triggering an alarm if performance deteriorates.
Monitoring, therefore, encompasses both alarm and performance monitoring, categorized as Fault (or Alarm or Alert) Management and Performance Management.
Active and Passive Network Monitoring
Let’s get into the types of network monitoring starting with the distinction between passive monitoring and active monitoring. Passive monitoring relies on devices sending data to a central monitoring system. This approach depends on devices notifying the system of any issues. However, its drawback lies in the fact that if a device becomes inactive, it stops sending data, and the monitoring system remains unaware of its health. Engineers address this by employing active monitoring, which periodically contacts or connects to devices to check their health. When a connection is impossible, the monitoring system deduces a problem and triggers an alarm.
Real-Time Network Monitoring
In terms of the timeline, monitoring can be real-time or near-real-time, indicating a swift response between a network's faulty condition and the alarm appearing on the monitoring system's screen—effectively "just in time" or within a very short duration (up to a few minutes).
Non-Real-Time Network Monitoring
Conversely, non-real-time monitoring relies on periodic batch processing of substantial data chunks (events or performance data), occurring at intervals like every 15 minutes or an hour. Historical data processing, log analysis, and other methods fall into this non-real-time category.
We go more in depth on these types of network monitoring in this popular blog.
Common Network Monitoring Protocols
Now, let’s look at typical sources of event and performance data and how they arrive from devices to network monitoring systems via protocols. We’ll just touch on a few but be sure to read more about them in this blog post about understanding network monitoring protocols.
Ping
Pingis the most basic method of active polling which utilizes a protocol called ICMP. Many network monitoring set-ups rely on this method as the primary method for generating alarms.
SNMP
SNMP is one of the oldest and most widely used protocols for monitoring networks. Even though SNMP is a management protocol and is intended to remotely manage (configure) devices, it is most often used for monitoring, both actively and passively. SNMP is a very old protocol, and it had two evolutionary steps. After version SNMP v1, the standard introduced SNMP v2c and SNMP v3, with each succeeding version introducing significant improvements to match the industry’s challenges. SNMP v1 was introduced in 1988, while version v3 was released in 2002.
Syslog
Syslog is a standard protocol supported by a wide range of network and IT devices that that allows them to send free text-formatted log messages to a central server. It is a great source of event and even performance data and is, along with Ping and SNMP, one of the key sources of data for each monitoring environment.
Log files
Another method of obtaining event data is by directly reading log files generated by a device. Log records then represent events that can be mapped to alarms when the record describes a faulty situation or a change of a faulty situation.
Element management systems & platforms
Element management systems are a great source of standardized and vendor-specific data. The same applies to different platforms such as virtualization platforms or cloud platforms (AWS, Azure).
Importance & Benefits of Network Monitoring
We’ve covered what is telecom and IT network monitoring. Now let’s look at why it’s so important for modern digital operations. Ignoring or underestimating the role of efficient network monitoring could lead to operational disruptions, poor digital service quality, security vulnerabilities, and increased operational costs.
We’ll just highlight a few here and if you need to be convinced further make sure to read Importance and Benefits of Network Monitoring (link to Importance and Benefits of Network Monitoring blog post).
Improved Network Performance
Network performance will improve thanks to proactive measures, continuous analysis, and timely response to identified issues. When all data related to the network is in one spot, engineers can manage the network and proactively prevent degradations, leading to more satisfied customers and less customer churn.
Effective Resource Allocation
Network monitoring helps with resource allocation and when resources are effectively allocated, the network is more productive and efficient. Network monitoring optimizes network configuration when engineers regularly review and optimize router and switch configurations and investigate network devices to ensure they are properly configured to handle the expected traffic. Engineers can further use network monitoring to implement load balancing to distribute traffic evenly across network links.
Cost Savings
Network monitoring helps with capacity planning which uses historical data to predict future capacity requirements and plan upgrades accordingly.
Cost savings can also come in the form of increased employee productivity, because a stable and well-monitored network will minimize disruptions and ensure that critical applications and services are consistently available.
The return on investment (ROI) is often realized through a combination of reduced downtime, improved operational efficiency, enhanced security posture, and better overall utilization of network resources. Companies that invest in robust network monitoring solutions are likely to see financial benefits over time as they experience fewer disruptions, improved performance, and a more secure IT environment.
Enhanced Security and Risk Mitigation
Another benefit of network monitoring is that it can quickly identify anomalies, suspicious patterns, or potential security threats. This proactive approach enables swift detection of security incidents, allowing for timely response and mitigation measures to be implemented. Additionally, network monitoring helps with enforcing security policies, ensuring compliance, and providing visibility into vulnerabilities, thereby reducing the overall risk of security breaches and unauthorized access.
Network Monitoring Best Practices
Strategies and practical advice for effective network monitoring abound and we're going to give our two cents worth. The following are what we believe are network monitoring best practices:
- Discovery of all resources required to provide services including network, data center, power, etc.
- Implementation of an umbrella approach by consolidating monitoring with enriched data to provide full awareness of all aspects of network health and performance.
- Implementation of alarm correlation and automation rules and scripts to automate repeating tasks and alarm resolution.
- Configuration of root cause rules and scripts on enriched data to reduce problem resolution time.
Get more insight and tips on network monitoring best practices in our detailed blog.
The Future of Network Monitoring
Like anything, network monitoring is constantly evolving and adapting to the latest technology trends and challenges.
For example, one trend is that of network and other IT systems providing a continuous stream of health and performance data being delivered by event streaming platform (e.g. Apache Kafka). This is conceptually different from the classic event-level monitoring.
The concepts of event management and fault management, as well as performance data analysis, are being reinforced by employing machine learning algorithms. For instance, data forecasting, which is the base for trending analysis, capacity planning and other functions, heavily relies on a family of ML algorithms to help engineers with network management in general. Root-cause analysis as well as automatic triggering of external actions are based on utilizing learning algorithms to reduce time to respond to network issues.
Autonomous assurance is becoming the focus of network monitoring systems as more systems are capable of autonomously engaging remediation processes based on learned lessons from the historic remedy activities of network engineers.
It goes without saying that modern networks are becoming more complex with the inclusion of hybrid and multi-cloud environments and having a network monitoring system that covers it all is key. Network monitoring solutions are adapting to support cloud-native architectures including multi-cloud environments, containers, and serverless computing.
Scalability is an ever-present challenge for engineers, with the goal of scaling their monitoring network systems seamlessly to meet expanding infrastructures.
UMBOSS & Network Monitoring
Finally, we’d like to introduce the ultimate monitoring concept – umbrella monitoring.
Umbrella monitoring is a term used to describe the concept of monitoring everything there is in a telecom network or corporate IT system in one place. An umbrella monitoring system, like UMBOSS, consolidates monitoring data from all parts of a company’s ICT: communication network, data center infrastructure (computing, storage, power supply, HVAC), voice and other service platforms, etc.
This type of platform then provides unified tools for event processing, alarm monitoring, performance monitoring, monitored data enrichment, network and IT discovery, and many other features. An umbrella portal combines all the data with information from other IT systems (such as asset management) to provide engineers with a 360-degree view over the network, data center, customers, services, and other data needed to manage the entire ICT efficiently.
If want to learn more about an umbrella approach to network monitoring, let us know, we’re happy to help. Ask us your question today or schedule a demo to see UMBOSS in action.