When most people think of IT, Incident Management process typically comes to the mind. It focuses solely on handling and escalating incidents as they occur to restore defined service levels. Incident management does not deal with root cause analysis or problem resolution. The main goal is to take user incidents from a reported stage to a closed stage.
Once established, effective incident management provides recurring value for the business. It allows incidents to be resolved in timeframes previously unseen. For most organizations, the process moves support from emailing back and forth to a formal ticketing system with built-in:
- Prioritization
- Categorization
- SLA requirements
The formal structures take time to develop but results in better outcomes for users, support staff, and the business. The data gathered from tracking incidents allows for better problem management and business decisions.
Incident management also involves creating incident models, which allow support staff to efficiently resolve recurring issues. Models allow support staff to resolve incidents quickly with defined processes for incident handling. In some organizations, a dedicated staff has incident management as their only role. In most businesses, the task is relegated to the service desk and its owners, managers, and stakeholders. To have a detailed understanding of Incident Management you can take up ITIL foundation certification.
The visibility of incident management makes it the easiest to implement and get buy-in for, since its value is evident to users at all levels of the organization. Everyone has issues they need support or facilities staff to resolve and handling them quickly aligns with the needs of users at all levels.
What Is ITIL Incident Management?
An unplanned interruption to a service or reduction in the quality of a service. The purpose of the incident management practice is to minimize the negative impact of incidents by restoring normal service operation as quickly as possible.
This article describes incident management process. It will be used as a reference for the implementation and use of incident management process on an ongoing basis. This process guide is based on the best practices described on the Information Technology Infrastructure Library (ITIL).
Every participant in the process is expected to understand and adhere to the process and roadmap from which lower-level operation procedures can be defined and implemented by the service improvement team and IT Delivery staff.
Incident Management Overview
The policy governs the Incident Management process and all procedures implemented for the management and execution of the process. This policy statement defines the system of governance that is used to ensure that support team members and contractors follow the prescribed process as requirement. The Incident Management policy is defined as a ser of ruled listed below:
- All incidents should be recorded in a ITSM ticketing tool as the single source of incident data.
- There shall be a defined escalation process to ensure timely resolution of incidents within agreed SLA.
- There shall be one parent incident for an issue, other related incidents shall be marked as child.
- Service Desk will be single focal point for the Requestors.
- The incident manager is responsible to produce all reports/KPI reports as per defined frequency.
- The Service Desk team should strive to match similar incidents; and use KEDB to find workarounds or permanent solutions to known issues.
- If support team resolves the issues and finds that the incident should have a different operational category that on which it was opened. The support team personnel should change the category at resolution to reflect the change.
- Incident priority shall be classified based on impact and urgency in accordance with documented criteria.
- Support team shall route tickets to the appropriate support group after mutual discussion to avoid re-assignment, tickets can be escalated to service desk team if resolver group is not known the assigned support group; thereafter service desk team will send the ticket to the appropriate team.
- Service Desk teams should escalate incidents to incident manager if appropriate/group is not found by service desk team or if the technical team does not feel it can resolve the issue.
- Incident manager will provide all support for hierarchical escalations. For all major incidents P1/P2 tickets the incident manager will inform and escalate all issues which require decision making at the highest level. Hence the support team should route incidents to the Incident Manager if they require hierarchical escalation.
- Tickets can be moved to pending/On Hold State, only if there is a justified reason and communication to the requestor.
- All resolved incidents should be closed after an agreed time.
- After resolution, if the support analyst feels that the incident resolution was just a workaround and permanent solution would need more investigation, the support personnel should propose a problem for the same and inform to problem manager. In addition, any known error should also be communicated to the problem manager.
Tasks in Incident Management
Incident Management Process consists of the following major sub process, which includes further processes:
- Incident Detection and Recording.
- Incident Classification and Support.
- Incident Investigation and Diagnosis.
- Incident Resolution and Recovery.
- Confirmation and Closure.
Incident Management Main Function
The document defines actual standards for delivering IT services, in case of customer has specific requirements the document can be customized to customer specific requirements. This document serves the purpose of providing material for high level training and education to end requestor and IT communities.
The intended audience for this document includes all incident management process roles, Service Desk Analyst, Manager, other service management process owners (Problem Manager, Change Manager), Application Development and Maintenance staff involved in incident management.
Incident Process
The high-level Incident Management process is depicted in the following diagram.
Organization Roles & Responsibilities
Requestor
Requestor is the authorized person to report issues:
- Contact Service Desk for reporting issue(s).
- Use Self-Help tool to report the issue(s).
- Provide detail description of Issue.
Service Desk Analyst (SDA)
The Service Desk is the first role that the Requestor interfaces. This includes initial support.
- Acts as the Single Point of Contact for all incidents.
- Capture all required incident details and log/update the incident.
- Categorizes and prioritize the incident.
- Relates new incidents to existing ones when applicable.
- Provides initial support to Requestor reported issue and route the incident to relevant support team, if needed.
- Tracks the incident till closure to ensure incidents are resolved within agreed SLAs.
- Escalates the incidents as appropriate after pre-determined threshold points are reached for unresolved incidents Keeps the Requestor informed of the incident status.
- Escalate incident to incident manager if unable to find appropriate support group.
Support Analyst/ Technical Support
- Resolve incidents within agreed service levels.
- Escalate the unresolved incidents to higher support levels at the appropriate time.
- Make appropriate use of available resources to resolve incidents (people, tools, and processes).
- Communicate the incident status internally and externally as applicable.
- Interface with other process as required to resolve incident.
- Maintain up-to-date knowledge on the relevant technical platform.
Incident Manager
- Reviews effectiveness and efficiency of the process.
- Creates procedures for incident management.
- Act as an escalation point to action any misrouted tickets.
- Ensure that incident management processes and tools are integrated with other processes.
- Is responsible for the success or failure of the process.
- Ensures that the process is defined, documented, maintained and communicated at all levels within the organization and to vendors.
- Is responsible for the requirement and guidelines of tool usage.
- Establishes and communicates the process roles and responsibilities.
- Establishes and communicates the process, service levels, and process performance metrics.
- Provides adequate process training for the organization.
- Establishes targets for process improvement.
- Monitors and reports on the performance of the process.
- Participates in other ITSM process initiatives and process reviews.
- Ensure Third Party performance of the incident management process.
Major Incident Manager
- Takes ownership of a major incident.
- Open War Rooms/Bridge Calls and manage communications.
- Coordinates among various teams/suppliers for resolution.
- Initiate the Major incident bridge and involve e all required stake holders.
- Determines stakeholders for communication updates and report.
- Determines content of the communication.
- Prepares Major incident report and presents it to the management.
- Work closely with Problem Manager post closure of a major incident and handover major incident.
- Synopsis to problem manager.
- Participates into the RCA call to detail the incident (if required).
- Highest priority of the incident will also be reported.
Detection and Recording
Detailed Description of Recording and Classification
Procedure | Input | Description | Output |
Identify Incident | Disruption of service. | - Requestor identifies a disruption of service.
| Incident Identified |
Email/Phone | Disruption of service | - Requestor calls/sends email to service desk.
| Call/Email sent to Service Desk |
Monitoring Tool Alert | Disruption of service | - Monitoring tools opens a ticket with support team without anyone’s interaction.
| Incident Logged |
Validate Details | Email Incidents | - Service Desk validates the data updated by Requestor.
| Validation done. |
Collect and Record Information | Incident identified. | - Service Desk collects and verifies the basic details.
| Incident details collected. |
Existing Record? | Incident Details Collected | - Service Desk to check if reported incident is for any existing incident record.
| Record identified. |
Current Incident | Incident Details on Call | - Service Desk fills the incident form based on details received.
| Incident created. |
Is Incident Escalated? | Incident details on Call | - Service Desk checks if the incident is due Escalation.
| Decision |
Invoke Escalation Procedure | Incident details on Call | - For calls due to escalation / Repeat calls – Initiate Escalation Procedure.
| Escalation Invoked |
Trigger Priority Change | Incident details on Call | - Follow the priority change process.
| Decision |
Update Incident | Incident details on Call | - Update the existing details about purpose of Requestor call.
| Updated Incident. |
Classification and Initial Support
Detailed Description of Classification and Initial Support
Procedure | Input | Description | Output |
Operational and Product Categorization | Incident Creation | Service Desk does the operational and product categorization and checks if it qualifies for an Incident. | Categorization completed. |
Prioritization & Linkage to CI | Categorization completed. Prioritization | Service Desk completes the impact and urgency of a ticket to generate priority if the incident and links the CI. | Prioritization & CI Linkage |
Initial Support | Prioritization | Provide an initial support to drive it towards resolution. | Resolved Incident & Assignment |
Incident Resolved | Initial Support | Checks to see if incident is resolved or not. | Assign to support Analyst. |
Assign to Support Analyst | Initial Support | After verification of the technical team, investigation and diagnosis is started. | Assign to support Analyst. |
Is the incident routed correctly? | Validation of Incident | Incident routed incorrectly. | Escalated to Incident Manager |
Escalation | Support group is not identified. | Ticket assigned to correct support team/ hierarchical escalation done. | Appropriate support group is assigned. |
Investigation & Diagnosis
Description of Investigation and Diagnosis
Procedure | Input | Description | Output |
Accept the ticket. | Ticket Assigned | The support Analyst accepts the ticket and ensure the response SLA being met. | Ticket Accepted |
Is it a Major Incident? | Ticket Acknowledged | Validate if the incident qualifies for Major Incident. | Major Incident Validated |
Refer Knowledge Article | Ticket Accepted | Knowledge article is referred for Workaround/Solution. | Workaround/Solution Checked |
Apply Workaround | Workaround Found | Workaround/Solution is applied. | Workaround/Solution applied. |
Investigate Further | Workaround not found. | Technical support specialist will investigate further. | Investigation started. |
Vendor Support | Vendor Support | Vendor contacted and ticket opened. | Vendor ticket logged. |
Contact Customer | Vendor Support not required. | Customer is contacted in case further information required. | Information gathered. |
Resolution and Recovery
Description of Resolution and Recovery
Procedure | Input | Description | Output |
Carry out the tasks for incident resolution. | Solution/Workaround identified. | Service Desk/Support analyst on identifying the solution/workaround, can start executing the task for resolution. | Resolution tasks executed. |
Is Incident Resolved? | Incident with updated logs | Support Analyst resolves the incident. | Decision taken. |
Functional Escalation Required | Incident with resolution steps | Support Analyst follows functional escalation. | Decision taken. |
Update worklog and resolve incident. | Resolved Incident | Incident work log updated. | Updated incident status and worklog |
Confirmation & Closure
Confirmation & Closure
Procedure | Input | Description | Output |
Confirm resolution with Requestor. | Incident Resolved | Resolution confirmed with Requestor. | Resolution accepted/rejected by Requestor |
Solution Accepted? | Resolved incident. | Requestor to validate incident resolution. | Decision taken. |
Reopen Incident | Resolution rejected by Requestor. | Requestor contacts service desk to reopen incident. | Incident Reassigned |
Closure of the incident | Incident was not reopened in 5 calendar days. | Incident auto closed in 5 business days. | Incident Closed |
Elevate your career with our online PMP courses taught by industry experts. Master project management and achieve new heights.
Major Incident Management
Definition
A major incident (MI) is an incident that results in significant disruption to the business and demands a response beyond the routine incident management process. Major incident has a separate procedure with shorter time scaled and urgency that is required to accelerate resolution process for incidents with high business impact. Take up KnowledgeHut IT service management certification to further boost your understanding of Incident Management.
Major Incident Priority Assessment Criteria
Incident priority is based on two factors – Impact and Urgency. Impact is defined as the measure of the criticality of the issue. Urgency is defined as the necessary speed of resolving an incident in timeline.
Urgency Code | Description |
Critical | A full-service outage of a critical system. System is non-operational and urgent response required. The damage caused by the incident increases rapidly. Delaying in resolution may lead to high revenue/business/productivity loss. |
Impact Code | Description |
Critical | Multiple systems are non-operational with major financial implications and needs to be restored immediately. A large number of customers are affected and/or not able to perform their BAU activities with business reputation at higher level. Workaround not available - Outage caused to a financial application.
- Data Corruption.
- 100% impact to network.
- Business critical services are impacted.
- Severe problem during critical periods (e.g., month end processing)
- Security Violation (e.g., denial of service, widespread virus, etc...)
|
Description of Major Incident Management Handling
Procedure | Input | Description | Output |
MI Identified | Incident Submitted as Major Incident | Major incident process is invoked by Service Desk/ Support Specialist. | Incident accepted by Major Incident Manager |
Open Conference Bridge & notify stakeholders. | Incident Reviewed and communication sent. | Major Incident Manager opens a conference bridge and Initial communications is sent. | Conference Bridge Opened & Communications sent. |
Inform related support groups. | Incident Reviewed | Major Incident Manager drives the bridge towards incident resolution and support groups are involved. | Related support groups involved. |
Determine stakeholders for communication. | Incident Reviewed | Stakeholders and communication plan are determined. | Stakeholders identified. Communication plan decided. |
Co-ordinate resolution | Fix applied/to be applied. | Major Incident Manager/Support group coordinating to resolve the incident. | Co-ordination for resolution Incident Resolved |
Collect Status | Co-ordination for resolution | If incident is not yet resolved, the status is collected by Major Incident Manager, and history is updated. | Status update |
Communicate the status to stakeholders. | Stakeholders identified. Communication plan determined. Status update | Major Incident Manager sends communication to stakeholders. | Communication to stakeholders |
Perform Major Incident Review | Incident Resolved | Major Incident Review and Incident report is prepared. | Incident report prepared and submitted. |
Lesson Learned and Follow-up | Major Incident Review | Lessons learned is recorded in Incident Review and Report. | Lessons learned/ preventive actions documented, and follow-up done. |
Prioritization Guidelines
This section describes on assessment of urgency and impact criteria and priority matrix calculation.
Urgency | Urgency Assessment |
Immediate Attention is required. | Critical |
Urgent attention is required as impact is same day. | High |
Urgent attention is required as impact is within 3 working days. | Medium |
There is no immediate attention required. Business as usual can continue, possibly with a workaround until resolved. | Low |
The Incident Statuses
The following section explains the status of an Incident:
- New: The status New cannot be selected by the users, it is assigned by the application after an Incident is logged.
- Assigned: After the Incident is logged, it is assigned to a Workgroup, based on the selected Tenant. The Workgroup then specifies the Category, Classification, Urgency, Impact, Priority, Workgroup, and SLA Service Window based on the Symptom provided by the End User. The status of the Incident changes to Assigned.
- In Progress: When the Incident is assigned to an Analyst, the status of the Incident is changed to In Progress. The Analyst can refer to various Knowledge Articles or Similar Incidents to work on the Incident.
- Pending: If the Analyst cannot continue working on the Incident as the End User needs to provide some details or the Incident is dependent on any other activity to complete, the status of the Incident is moved to Pending.
- Resolved: After the Incident is In Progress, the Analyst should resolve the Incident within the provided Service Window. After an Incident is resolved, the status of the Incident is changed to Resolved. Resolved incidents can be added as a Knowledge Base by selecting, Add to KB check box option on the Incident Details page. The End User can reopen the Incident if the resolution is not satisfactory.
- Closed: After an Incident is resolved, the status of the Incident can be changed to Closed based on the configuration (manual closure or auto closure).
- Canceled: The status of an Incident can be changed to Canceled if the End User does not want any further investigation on the Incident (For reasons, such as the issue is resolved or unable to replicate the issue). This option is available to the users if configured by the Administrator.
Top Cities where KnowledgeHut Conduct ITIL Certification Training Course Online
Best Practice for Implementing Incident Management
The following incident management practice has been designed for all parties whether internal such as IT departments or users, or external service providers that participate in service management including but not limited to IT functional areas such as Application, Infrastructure, Information Security, will adhere to the incident management process.
The process goal describes a specific purpose or achievement toward which the efforts of the process are directed. The purpose of incident management practice is to minimize the negative impact of incidents by restoring normal service operation as quickly as possible in a controlled and predictable manner. It is a fundamental element of service management. The quick restoration of a service is a key factor in user as well as customer satisfaction, the credibility of the provider and the value organization creates in the service relationships.
Scope of the practice includes activities that are undertaken as part of the practice aiming at reaching the goal of the practice. Scope of incident management includes:
- Detecting and registering incidents.
- Diagnosing and investigating incidents.
- Managing incident records.
- Communications with relevant stakeholders throughout the incident lifecycle.
- Reviewing incidents and initiating improvements to service and to the incident management practice after resolution.
Advantages of Incident Management
- Helps minimize the business impact of incidents and increase effectiveness by timely resolution.
- Enables proactive identification of beneficial system amendments and enhancements.
- Improves proactive monitoring, thus enabling accurate measurement of performance against SLAs.
- Promotes dissemination of information on different aspects of service quality
- Enables better utilization of staff that in turn leads to greater efficiency.
- Enhances customer and user satisfaction.
Conclusion
The incident management process is triggered when Requestor contacts the service desk Single Point of Contact (SPOC) to report service disruption. When Auto-detected events generates an incident in the management tracking tool. When Internal support group identifies a service disruption (potential disruption) on managed system and generates an incident. The Incident Management is considered complete once work-around or solution is implemented, and Incident is resolved and closed. Take up KnowledgeHut ITIL foundation certification for better knowledge.