Moving IT Operations from Silo to Services Management
A collaborative approach is the ideal solution for service management and infrastructure triage.
By Srinivas Ramanathan, CEO, eG Innovations
The concept of a service is well understood in many spheres. For instance, in telephony, service generally refers to the always-on, utility grade-delivery of a telephone dial tone. Similarly, in cable TV, the service is the programming available live and on-demand from a cable operator.
The concept of service in the IT domain is not as clear. In fact, most users and IT administrators still confuse “applications” with “services.” Although applications are individual software components (such as a Web server, a database server, or an ERP application server), a service refers to functionality that a user receives. For example, when you bank online and check your account balances or pay bills, online banking is an IT service. So is buying a book online.
As businesses use IT to support their key revenue-generating functions, there is a strong desire to achieve utility-grade reliability and performance through service level agreements (SLAs). The challenge for IT organizations is to maintain maximum end-to-end availability and performance of IT services that span multiple application, network, server, and storage tiers.
This article examines the problems IT organizations encounter when trying to apply silo-specific tools and processes to the management of services that span multiple tiers of the enterprise infrastructure.
The Challenge of Managing IT Services
One of the challenges with true service management is defining the service levels appropriately. Most service-level management initiatives end up focusing on metrics that usually relate to one application (silo). For example, a service-level agreement can be made for application latency, another for Web server availability. However, these are not examples of true service-level guarantees because they look at only the performance of an individual silo instead of measuring how well the silos works together in delivering a service.
Another challenge is that application silos are organized and managed independently using performance metrics specific to each silo. Yet end users are not at all concerned with the infrastructure topography or how the individual silos operate (e.g., a user has no idea of whether an application is running on a physical server or a virtual machine and what kind of storage it is using). The user only cares that the service is functioning properly and responding in a timely manner.
A final challenge is that an IT operations team often lacks a common view of a service topology. An IT service involves multiple applications, network devices, virtualization, and storage devices that all have to work together. The service topology is rarely documented and different administrators involved in supporting the service are not aware of the exact data flow and dependencies. Lack of basic knowledge of how a service functions means that the help desk and operations personnel spend hours trying to deduce the root-cause of a problem.
Finally, in an IT infrastructure, there are tight dependencies between the different tiers. If one application tier fails or slows down, this can affect all the other tiers. IT operations teams need a way to go beyond silo monitoring and management and look at the whole service topology and the inter-dependencies between the tiers.
Going from Silo to Service Management
An effective service monitoring and management solution is one that measures the true user experience, monitors each of the tiers involved in supporting the service, understands the inter-dependencies between the tiers, and uses these interdependencies to correlate between the performance of the different tiers to deduce the root-cause of a problem when it arises. Is it in the network, the database, or in the application? Could it be in the virtualization platform or in the storage tier? In effect, the service monitoring and management system provides a single-pane-of-glass view for the entire IT operations team.
To be effective, a service monitoring and management solution needs visibility into the different tiers of the infrastructure. However, this is easier said than done. The different tiers of the infrastructure have different administrators who use different tools for monitoring their respective silos. These administrators are often unwilling or not interested in exposing the performance of their portions of the infrastructure to the other administrators. This leads to the silo-oriented management approach we see in most organizations.
How, then, do we get to a service-oriented, end-to-end service management solution? One approach that organizations have tried is the top-down approach, dictated by the organization’s management. A top-down approach can work, but it takes a lot of time -- and a lot of time convincing the different silo administrators to get it to work.
A Collaborative Approach Accelerates Service Management Adoption
A collaborative approach may be the best way to shift IT operations from silo to service management. The collaborative approach starts with one (or some) of the tiers of infrastructure. Very often, this is the front-end-facing tier -- e.g., the Web tier, Citrix tier, desktop tier, etc. When users access an IT service, they only see the front end. For example, a user accessing an online banking service will complain that the website is slow. Likewise, a user logging into Citrix to access corporate applications will complain that Citrix is not working well. IT help desks that receive user complaints pass the problems along to the front-end application administrators without realizing that the user complaint is not about a specific application but about the IT service in its entirety. The front-end administrators then face the problem of proving exactly where the problem lies.
Inadvertently, the front-end administrators become administrators responsible for monitoring and troubleshooting the service end-to-end. To do this effectively, they need visibility into as much of the supporting infrastructure as possible. Because other administrations may not be willing to provide access to their domains, the front-end administrators have to work with limited visibility into the infrastructure. To start the problem diagnosis, front-end administrators need monitoring tools that can:
- Measure service performance end-to-end and alert them proactively to situations when there may be a problem.
- Provide in-depth visibility where possible into any tier on which they are deployed. The monitoring solution should have the expertise to detect and alert on most common performance problems in those tiers.
- Provide cross-silo indicators of performance where possible so it is possible to quickly triage a performance problem down to one of the tiers. It is not practical to require all of the administrators in an organization to deploy one common toolset. Hence, being able to surmise in which tier a problem may lie without requiring in-depth visibility into all of the tiers is a critical requirement for a service management solution.
There are several ways to achieve the last point
- Network packet tracking tools snoop on network traffic to get an idea of where time is being spent in processing a request.
- Some of the monitoring tools achieve this by tracing a transaction and tracking the hand-off between tiers (e.g., how much time was spent in the database, how much in the messaging server, etc.).
- Even if application-level tracing is not possible, by observing key performance indicators at hand-off points between tiers, it is possible to detect performance anomalies caused by individual tiers. For example, the number of connections active to the database may indicate a possible database bottleneck. Likewise, a large number of network packet retransmissions may signal a network bottleneck.
With such a service-monitoring and management solution, when a problem arises, an IT administrator will be able to conduct first-level triage and pinpoint where the potential problems lie. With tangible evidence of bottlenecks in their domains, other administrators will be forced to dig deeper into the performance of their tiers with the service context in mind. The successful detection and resolution of a problem, as a result of the cross-silo visibility of the service monitoring solution, will provide an administrator will gain confident in using this tool. This is often the trigger to allow administrators to collaborate. As the adoption of the solution grows, collaboration between administrators in the organization will also grow.
To successfully complete the move from silo- to service-oriented management, the service management solution should:
Provide easy-to-understand topology views of each IT service, allowing even non-experts to easily see which parts of the infrastructure are working well and which are not.
Automate the analysis of the infrastructure. The service-monitoring solution must maintain service topologies to indicate interdependencies between applications and other infrastructure tiers. The solution should also use service topologies as a means of correlating alarms detected by the system. This way, alerts can be prioritized and sent to the administrators so they can focus on problem causes and not be distracted by the effects.
Abstracting the details of each tier. Because IT administrators may not be experts at every silo, the service-monitoring solution must use a common paradigm to represent the health of each tier. A consistent user interface view makes it easy for administrators to quickly determine which tier is working and which is not, and for each problematic tier determine where the problem lies.
The use of multi-tier architectures has increased the complexity of IT infrastructures. Because no administrator is an expert in all the technologies involved in delivering a service to the end user, most organizations have resorted to silo-based monitoring and management, using specialized tools for each silo. As a result, administrators do not have the holistic or end-user view of the service. Problems go undetected and it is often difficult to find the root cause of the problem.
Solving this problem requires a collaborative, service management approach. Solutions that provide in-depth visibility across heterogeneous IT tiers can provide cross-silo indicators of performance, be able to operate with limited visibility into specific domains, and can correlate performance across the different tiers which is essential for such a collaborative service-management approach to be successful.
Srinivas Ramanathan is the founder and CEO of eG Innovations, an award-winning provider of intelligent performance monitoring solutions for cloud, physical, and virtual environments. Prior to starting eG Innovations, Srinivas was a senior research scientist at Hewlett-Packard Laboratories in Palo Alto, California. At HP, Srinivas was the chief architect of Firehunter, an ISP performance-monitoring solution. He was also a key contributor to the second version of HP's WebQoS product for enabling quality of service for Web applications. You can contact the author at Srinivas@eginnovations.com.