How to Apply ITIL to SOA Operations Management
An overview of its disciplines in the context of SOA requirements
Oct. 11, 2008 11:00 PM
Incident management aims to restore services to their normal operational levels as quickly as possible, to ensure that service levels are maintained. The service desk has a key role to play in this discipline, capturing the incident details, using configuration management to see whether this incident was reported earlier or if there are any related incidents, categorizing the incident if required, looking for known causes, escalating if service level agreements are being breached, and informing the user of the status of the problem, when to expect the issues to be fixed, or any actions the user can take to circumvent the issue.
A key consideration for SOA incident management is the number of component parts that make up a service, and the correlation of reported incidents from each of these parts. For instance, if there is a database issue, a number of services will start reporting errors. Coordinated error logging and incident management is required to prevent a flurry of independent support activity.
While the objective of incident management is to restore services that support the business as quickly as possible, performing tasks such as searching the configuration database for known errors, problem management focuses on determining the root causes of incidents, their resolutions, and prevention.
The distributed nature of SOA presents additional challenges to the discipline of problem management. For example:
- All service incidents should have common elements that should enable a generic support person to investigate the problem and perform some initial diagnosis.
- All infrastructure components and services must be monitored and provide consistent error messages that can be linked to items in the configuration database.
- Given that multiple support groups may be involved across the enterprise, the challenge will be to correctly determine the appropriate support groups and if required to coordinate effort across multiple support groups.
Change management is responsible for managing the changes to configuration items and involves processes such as change initiation, logging, prioritization, assessment, scheduling, building, testing, implementation, and review.
SOA requires that the relationships between services and clients are well understood, in particular the business criticality and versioning considerations of those relationships. Easily assessing the quality and impact (backward compatibility) of changes is also crucial for SOA effective change management and will be greatly facilitated by automated build, deployment, unit, regression and performance testing, as well as build time checking of quality criteria and test coverage.
Whereas configuration management is concerned with the logical aspects of configuration items, release management is concerned with the physical deployment of those items.
In the case of SOA the requirement is that services may be easily deployed to multiple locations, whilst existing versions remain running, e.g., in order to support long running processes.
Availability management ensures that services are available to the users who are authorized to use those services, when they are needed, and is concerned with managing the following aspects of services: availability, reliability, maintainability, serviceability, and security.
For SOA we need to understand how well services are running, as a percentage of defined and measurable service level agreements (SLA). Measurement is required both for raising alerts and for capacity planning purposes.
The availability management "Nirvana" is effective SLA monitoring and reporting of each component of the service, supporting capacity planning, and even resulting in the automatic deployment of additional services to spare capacity infrastructure in order to support SLAs.
Services must be deployed in a clustered environment, supporting transparent failover. This generally means that the monitoring system knows that a service has failed before the client does and transparently removes the failed part from commission. It also means that the transactional and client retry semantics of services are clearly understood.
Capacity management deals with the daily monitoring and reporting of workloads, resource usage, and service performance. It is also responsible for capacity planning by identifying trends and predicting future needs. Capacity management draws on data about the past and present environment to help to optimize current performance, estimate future needs and demands, and to take steps to be ready to meet them when required. This requires a sophisticated monitoring infrastructure capable of gathering historical data as well generating real-time alerts.
For SOA the challenge is providing sufficient capacity to allow new clients to quickly come on board.
Service Level Management
Service level management is the process of negotiating, defining and managing measurable levels of service that are required and cost-justified.
SOA requires that each consumer-producer interaction is subject to a service level agreement that is well-defined and measurable. Measurement is required for example to monitor service performance with respect to pre-agreed targets and to raise alerts.
Continuity management is concerned with managing risks to ensure that an organization can, at all times (even if disasters of various types strike), continue to operate to at least a predetermined minimum level.
Business units across the enterprise must develop and maintain plans to continue business in case of a disaster, and this applies no less to SOA.
Financial management is concerned with identifying the costs necessary to provide services, and establishing a fair means of recovering these costs from the business.
Charging of services should only be implemented if it will give a clear value to the organization and if the environment is ready for it. The benefits of charging include improved cost consciousness and better utilization/prioritization of resources, as well as allowing IT to recover costs in an equitable manner.
As organizational SOAs grow, systems will become more fragile and potentially chaotic as the network of moving parts involved increases. This will require sophisticated service management capabilities and the formality of a framework such as ITIL. To assume otherwise will result in failure.
- Krafzig, et al. Enterprise SOA - Service-oriented Architecture Best Practices. Prentice Hall
- Darmawan, et al. Business Service Management Best Practices. IBM Redbook.
- Manoel, et al. Service Level Management Using IBM Tivoli Service Level Advisor and Tivoli Business Systems Manager. IBM Redbook
- ITIL (IT Infrastructure Library)