Blog

Service Quality Monitoring in the Telecom Industry

Beginner’s Corner

UMBOSS Team

Dec. 2, 2024

8 min. read

In our previous blog post, “Telecom Service Assurance Data Models,” we introduced the concepts of service modeling in telecom and digital service providers (DSPs) in general. In this post, we extend the discussion to explore the concepts of telecom service monitoring through service modeling. This discussion will lead us to further examine service inventory and introduce the concepts of service-level agreements (SLA), service-level objectives (SLO), key quality indicators (KQI), and, finally, the monitoring of service quality through service quality management (SQM).

Why do we need model-based monitoring?

All network monitoring systems and tools are inherently tied to collecting data from resources such as networks, network functions, data center infrastructure, and virtual or logical devices. However, what telecoms and DSPs truly need is to understand how customers' services are performing so that, when complaints arise, they can respond quickly and efficiently.

Some monitoring systems use direct measurement methods, utilizing performance data gathered through active or passive taps (probes) installed in the network. These systems may also provide customer location-based measurements related to specific services provided at that location. But ultimately, this is still resource-based data.

To truly understand how services—and by extension, products—are performing, it’s necessary to combine resource alarms and performance data with the service model to calculate the perceived quality of the service. This is an indirect method, but it remains the most cost-effective approach.

Monitoring key aspects of service health is one thing, but telecoms and DSPs are primarily concerned with identifying which services are degraded and which customers are affected. Therefore, there must be a mechanism to check the conformance of service-oriented quality measurements and calculations against predefined quality criteria and alert engineers when a service is degraded.

This service quality management mechanism allows telecoms to prevent problems from occurring, resolve issues before customers even notice something is wrong, and fix problems quickly when customers complain. Local or global outages are easily detected, affected customers are quickly identified, and necessary communication with them is streamlined to minimize dissatisfaction.

From what we've just discussed, it's clear that one must have definitions of well-performing services (SLAs), mechanisms to compare calculated service performance with SLAs, and most importantly, a well-designed and updated service inventory to know precisely which customers are using which products and services, and what resources are being used to deliver them.

Our next section will cover service inventory and its relationship to the fulfillment process as its major source of truth.

Populating Service Inventory: Service Fulfillment and Service Discovery

Service inventory is a key component required by service assurance, and it is the central component of any decent service fulfillment system. It is the repository that links products (and thus customers) to CFSs, RFSs, and resources. Without this mapping, there is no way to execute functions like SIA or SQM that we mentioned earlier. Therefore, it is clear that having service inventory in place and populating it with all active services is essential. But how do you populate it?

There are two complementary approaches. One is populating service inventory data from existing data sources for a newly introduced service inventory, and the other is maintaining the inventory based on data from the service fulfillment process of the telecom.

Of course, there is a third alternative approach—manually inputting the data. However, this is often quite challenging, especially when there are many services and frequent changes to those services.

Newly established Service Inventory

If a service inventory is being newly introduced, there may be an existing database with service and product data that data engineers can extract and use to populate the inventory. However, this is rarely the case, as data in existing systems are often incorrect, obsolete, or simply lack the ability to be related effectively. Therefore, an additional (or rather alternative) approach can be taken—service discovery.

Service discovery involves analyzing network configurations across all devices and using specific rule sets to extrapolate data on the resources, their configurations, topologies, and relationships. This data is then mapped to customers, product specifications, and offerings, and used to generate RFSs and CFSs according to their specifications. This is a very complex technical task and requires a significant amount of custom code development to accomplish.

Service Inventory interacting with the fulfillment system

Once the baseline records are in the service inventory, the way to continually maintain the service records is by its interaction with the fulfillment process that uses SI to store its key data.

The fulfillment process encompasses all the steps taken from the moment a customer purchases or leases a product to the point they begin consuming the product. For instance, when a customer purchases a TPP product, dozens of interrelated processes occur to configure services on resources, deliver the home router and set-top box to the customer’s home, and activate and bill the product.

TM Forum prescribes a comprehensive set of standards in the area of fulfillment, which generally revolves around the concept of capturing and fulfilling orders (among other things) at the level of customer, service, and resource. For that purpose, TM Forum’s SID model introduces pattern of order and order item entities applied to all customer, service, and resource levels. The order is simply an entity that contains its order items.

We can explain this concept by revisiting the TPP example from our previous blog post
“Telecom Service Assurance: A Data Modelling Perspective”, and for the sake of the discussion, we will use a very simplified explanation.

Imagine the customer that purchases the product. Naturally the first thing that happens is capturing customer’s order which requires the customer’s data, location, and the definition of the precise product offering the customer wants. Once this is completed, the fulfillment process is initiated, and the first order is created – also known as the product order.

The product order contains product order items. One product order item might be voice, the second Internet access, and the third IPTV. The fourth product order item would be the resources delivered to the customer’s premises: the home router and STB. This is the order as perceived on the product level. However, one must generate a service order that describes what needs to be provided at the service level.

Following the same pattern, the service order contains an order item for each of the CFSs (voice, Internet, IPTV). Each CFSs corresponds to its product order item. Then, each CFS is internally mapped to corresponding RFSs.

Next, for each RFS, there are one or more resource orders, each having resource order items. For instance, there might be a resource order for the ISP platform service composed of a single resource order item for the ISP platform function. One must not forget the resource order to provide the home router and STB that is generated from the product level, when decomposing the product order to its order items.

This series of orders and their relationships are depicted in detail in the figure below. It is only one possible (and very simplified version) of the many scenarios that telecoms may devise in their operational setup. Here we use it solely to illustrate basic concepts.

Decomposition of Product Orders and Service Orders into their respective Order Items

Another important element to fill in the gaps in the previous figure is understanding how service orders and service order items relate to the concepts of CFSs and RFSs. The following figure illustrates this relationship in a very simplified way.

Service Order and Service Order Items vs CFSs

Service orders and service order items are related to (have data references to) services and their specifications. The service catalog and fulfillment engine together provide information on RFSs, their characteristics, and values.

Now, with the introduction of the concept of orders and order items and how they are incorporated and related to the data model, one can easily understand fulfillment at the customer, service, and resource levels.

At the customer level, it is clear that the customer places an order for product offers, which are materialized as products and stored in the product inventory. A customer may order one or more products. The set of processes and functions that capture customer requests, validate, and decompose them into product items and related services, and place orders to fulfill service orders is called customer order management (COM)—usually a very sophisticated system with orchestration and complex business logic that is often part of CRM systems.

Besides generating service orders, COM also generates resource orders for the resources (e.g., customer premises equipment or SIM card, etc.) defined in the product specification. Of course, COM includes many more elements and functions, but for the sake of this discussion, this will suffice.

Once COM generates service orders, these are handed over to service order management (SOM). SOM decomposes service order items (referring to CFSs and their specifications) into internal RFS order items, based on service catalog data and the data from the service order. Then, it further decomposes them into a number of resource orders for the resources that need to be configured for the respective RFSs and hands them over to resource order management system (ROM) that is generally concerned with orchestrating equipment provisioning, configuring network elements and platforms, and activation.

In general, SOM and ROM are separate specialized software units, usually containing advanced business process management systems and business logic with clearly defined APIs. Since they contain order records as previously explained, they can easily add, update, modify, or delete information in the service inventory and resource inventory. The following figure illustrates the relationship between COM, SOM, and ROM and how these interact with service inventory and resource inventory.

Interaction of Service Fulfillment with Service Inventory and Resource Inventory

As one can see, an existing service inventory can be (and must be) easily updated by SOM as it is the main way to maintain its consistency, accuracy and allow for successful service assurance!

Product Offering, Service Level Agreement (SLA) and Service Level Objectives (SLO)

All customers naturally expect that purchased products meet a certain level of service, satisfying their expectations of how well the product should perform. These performance expectations constitute what is known as a Service Level Agreement (SLA).

According to the TM Forum, an SLA is defined to cover the performance aspects of all segments of a product’s lifecycle, including point of sale, contract signing, provisioning, usage, and termination. However, for the purpose of this discussion, we will focus on the in-use aspects of the SLA.

Managing the SLA during the in-use phase of a product’s lifecycle involves monitoring the services that make up the product and comparing their performance against the agreed-upon SLA service levels. This, of course, includes alerting engineering and customer support staff to any pending or current SLA violations (service alarming).

To facilitate SLA management, the TM Forum defines a specialized data structure centered around a key entity called the Service Level Specification (SLS). The SLS is the data structure that formally records the SLA, or in other words, it formally defines and documents the expectations of both the customer and the DSP/telecom regarding product and service performance. In other words, the SLA is expressed in terms of the SLS.

Later on, we will briefly mention how SID addresses the SLA data structure as well.

Service Level Specification, Service Level Objective (SLO)

We will explain the relationship between the SLS and other service model entities using a simplified representation in the figure below, which extends our discussion on service modeling from a previous blog post “Telecom Service Models for Assurance: an introduction”.

Service Level Specification as the measure of CFS and RFS quality

The uppermost part of the figure depicts once again building elements of the product and service catalog: product offering, product specification, CFS specification and RFS specification. SLS can be associated with any of them, and naturally, a single SLS can be shared among any number of specifications and product offering.

SLS in itself is merely a container for a number of Service Level Objectives (SLO) and Service Level Specification Consequences. A single SLO is one of the quality goals of its SLS and it is defined in terms of measurable parameters and metrics, with associated threshold values and tolerances. For instance, it can be monthly service availability with a well-defined minimum conformance value.

SLS Consequence is defined as the action that takes place when the quality goal defined in SLO is not met. For instance, this can be a warning or escalation to technical department or sending an apology letter to the customers when the goal of availability is not met. We will only mention that there may be different consequences defined for the same SLO that are applicable based on the definition by applicability entity (not depicted in the figure). For instance, the consequences may not be the same during the weekend and workweek.

Now, the nature of SLS and associated SLOs largely depends on their associations. It is clear that SLSs associated with CFSs, product specification and product offering will tend to define customer-oriented SLOs, while SLSs associated with RFSs will obviously tend to define internal (telecom/DSP) technical SLOs, the ones that are of internal concern of the provider.

For instance, customer-oriented SLS on the product level may include objectives related to minimum services availability, maximum response time, maximum resolution time etc. A CFS-related SLO may deal with perceived aspects of the service. For instance, for L2 VPN service this may include a promise that the VPN’s speed will be at the nominal level for at least 99.5% of time during any calendar month, as this is what the customer expects.

On the other hand, RFS-related SLSs and SLOs will focus on very technical aspects. For instance, one might define that the maximum packet drop rate will not be higher than 10 packets per hour on a microwave link for 99.9% of the time during any month.

In that respect we can distinguish between SLSs that constitute customer SLAs, and SLSs that constitute internal (telecom/DSP) SLAs. Therefore, we have Customer SLAs that necessarily involve the customer as a party, and internal SLAs that may include other parties like suppliers, vendors etc. The discussion can go into many details, but this post is not the place where we should go into it.

We will only state that SLA is recognized as a type of Agreement in TM Forum’s SID model, and SLSs, SLOs and other associated entities are part of an agreement data structure.

KPIs, KQIs for Service Quality Monitoring

Now, let's take a closer look at how SLS defines its SLOs and the parameters on which they are based. The following figure should shed some light on this.

Relationship between SLS, SLO and SLS Parameters

SLOs are related to their respective Service Level Specification Parameters, which can take two forms: KPIs and KQIs. Whether an SLO is associated with a KPI or KQI depends on its goal.

In terms of the service model, a KPI (Key Performance Indicator) measures a specific aspect of the performance of a resource or RFS, regardless of whether it's a network resource or not, or a group of RFSs or resources. Of course, there are many examples of network- and data center-based KPIs, such as link utilization, but KPIs can also be non-network-related, like the time it takes to answer a customer’s call in a call center.

However, for service quality monitoring, a KQI (Key Quality Indicator) is used. A KQI is defined as 'a measure of a specific aspect of the performance of a product (ProductSpecification, ProductOffering, or Product) or a service (ServiceSpecification or Service).' A KQI draws its data from various sources, including KPIs. For example, we can analyze a KQI for the CFS that represents the availability of internet access service.

Since this KQI depends on the availability of the access link, access network, core network, and ISP platform—each represented by a separate KPI—the KQI can be defined as the multiplication of these four KPIs. The method used to calculate a KQI from KPIs, performance metrics, and other data is called the KQI's transformation algorithm.

In general, KQIs’ primary role is to facilitate compliance with SLAs, or more precisely, compliance with SLS that incorporates the SLA and its SLOs. The calculation of KQIs and compliance check against SLOs is executed in a computational system called Service Quality Management or SQM. SQM provides outputs in the form of service-level alarms that may trigger consequence-defined actions, provide quality reports and other things.

It is important to note that SID entities of KPI SLS Parameter and KQI SLS Parameter are not intended to store actual values of the parameters. They actually specify what those parameters are, their properties and association with corresponding SLO. The very calculation and maintenance of the KPIs and KQIs is the responsibility of network monitoring systems and SQM.

Checking SLO compliance

Now, for the sake of understanding the SLO compliance process, which represents the interaction between KPIs, KQIs, and SLOs, here are some important facts:

An SLO has two target values for its parameter: conformance and threshold. Conformance is the value at which the parameter violates the SLO’s objective. On the other hand, the threshold is the value that indicates a warning — the point at which there is a risk that the SLO’s objective may not be met.
A comparator is defined to specify whether the violation occurs when the value is higher or lower than the conformance target. The SLO validity definition specifies the period of time during which the SLO is applicable.
For the warning threshold, a tolerance period is defined, which must expire before a warning alarm is triggered—assuming the threshold has been violated. This helps eliminate false-positive warnings.
In the case of the conformance target, the SLO defines a time interval during which it must be measured, as well as grace periods — the tolerance for a number of unsuccessful updates to the objective status that still allows the SLO to remain compliant.

Concepts of Service Quality and Performance Monitoring

Finally, with an understanding of service models, how they are populated, their interactions with inventories, and SLA definitions, we can complete the picture of service quality monitoring by providing a brief introduction to Service Quality Management (SQM) and its role in service assurance.

Now, to be completely aligned with the newest TM Forum terminology, we must distinguish between applications and functions. The applications that do the job of service monitoring are called “Service Performance Management” and “Service Quality Management.”

Service Performance Management is related to providing an end-to-end service performance view based on data available from resource (network) monitoring and end-to-end performance test data. Service Quality Management (SQM), on the other hand, is the application used to manage levels of service in terms of comparing KQIs to SLOs.

From a functional perspective, the terminology that unifies things happening in both applications is “Service Quality and Performance Management.” Very logical, right? Anyway, the industry today still uses the well-established term for both the application and functional view – Service Quality Management or SQM.

Simply said, Service Quality Management is a software function (an application) primarily designed to assess the quality of services as perceived by customers. To achieve this, SQM ingests various types of data, including network (resource) events, alarms, performance metrics, calculated KPIs, and other non-network-related data from sources such as network and element management systems, call centers, ticketing systems, and more.

These data are combined to calculate KQIs, which serve as the primary quality metrics for services that form products. This process constitutes the service monitoring aspect of SQM.

The second function of SQM is the continuous conformance check of all defined SLOs against the calculated KQIs to identify which services are degraded and, where possible, to what extent. SQM detects threshold and conformance target violations, generates service degradation alarms when SLAs are at risk, provides reports on degraded services, and offers service quality trending, spatial and time-domain analysis, and reporting.

These alarms and reports are crucial for the successful execution of assurance processes and the management of customer SLAs and satisfaction.

The internal mechanisms of SQM are not within TM Forum’s scope. TM Forum only defines SQM’s external behavior via its REST API (Open API). It assumes that SQM is fully aware of SLSs and SLOs and that it should provide notifications for the creation, removal, and attribute value changes of all SLSs and SLOs. The API also defines methods for retrieving, creating, and updating SLSs and SLOs. A pub/sub API method is assumed to be used by each service instance to deliver its health state, execution state, failures, and metrics.

Ultimately, the implementation of an SQM is the responsibility of a DSP and its suppliers. In our next blog post, we will delve deeper into how UMBOSS approached the implementation of an SQM Module and the impact that real-world situations in many telecoms had on its design and development. The following figure illustrates one possible architecture for SQM implementation, providing insight into how the SQM component interacts with other elements of the OSS landscape mentioned in this and our previous post
"Telecom Service Models for Assurance: an introduction”.

Example architecture of an SQM system

As one can see, SQM ingests all network and non-network events, alarms, and performance data (both metrics and calculated KPIs), as well as data from non-infrastructure systems like call centers, ticketing systems, and more. Naturally, access to service inventories and catalogs containing SLSs, SLOs, parameter definitions, and SLA agreements is essential. The internal mechanisms of SQM handle KQI calculations, compliance checks (comparing SLO parameters against thresholds and generating alarms), reporting, and other functions. All of this should be accessible via a TM Forum-compliant API.

One can note a module called “Internal Service Modeling.” This function is particularly important as it is part of TM Forum’s function called “Service Quality and Performance Development.” There are two elements of this function. The first one is Service Performance Development which is dedicated to the development of indicators to calculate and measure service performance and quality, and the conditions of service alarm activation. The second one is Service Quality Development which is concerned with service quality indicators specifications and the rules connected to them.

SQMs often implement service modeling to cascade KQI calculations from the RFS to CFS and product levels. You will be able to learn more about these concepts in one of our following blog posts that will discuss SQM implementation concepts in much more detail.

UMBOSS & Service Quality Monitoring

UMBOSS is an umbrella network and service assurance platform. Its main components supporting service assurance are Service Inventory, Service Quality Management, and Assurance Automation. UMBOSS takes a specific approach to SQM, considering the availability of complex concepts and functions we’ve introduced in this and previous posts, within the context of actual telecoms/DSPs. Some of these concepts will be explained in one of our upcoming posts.

Have any questions? Want to learn more? Get in touch and let us know how we can help. Send us a message or book a demo today.