By Julene Marr, Raees Gabier & Jack Thompson
In the fast-paced world of technology, operational teams are the backbone of service reliability, ensuring systems run smoothly and customers remain satisfied. These dedicated professionals work behind the scenes, often going unnoticed, to keep everything running seamlessly for us all.
Yet, as infrastructures grow in complexity - with intricate codebases, integrations, and networking layers - managing them while equally balancing cost pressures becomes increasingly challenging.
Enter Artificial Intelligence (AI), poised to revolutionise operational support by not only enhancing our ability to prevent and swiftly resolve incidents but also by providing context-aware insights that bridge the gap between technical teams and business stakeholders.
A day in the life of an operational team
Imagine it's 8:00 AM on a typical Tuesday. The operational team at a major financial services company prepares for the day when an alert surfaces: transaction processing times are spiking. The team members, who know the stakes and the pressure of such days, gear up for what could be a challenging start to their day. Previously, this would have led to a frantic search through logs and network diagnostics, however, today an AI-powered system has already delved into the issue.
Drawing on comprehensive access to operational logs, performance metrics, and the application's codebase, the AI identifies that the spike began after a new update to the payment gateway service. It recognises a specific function causing a bottleneck due to inefficient encryption handling. But the AI doesn't stop there - it understands the business context, assessing the potential user impact and forecasting how the issue could escalate if unresolved.
Understanding systems through AI's deep and contextual insight
The team lead asks the AI assistant, "Why are transaction processing times increasing, and what's the potential impact?" Concerned about the possible fallout, they need fast and clear answers. The AI responds in natural language:
"The recent deployment at 7:45 AM introduced a change in the encryptTransaction function, increasing computational complexity and processing times. Currently, 15% of transactions are experiencing delays, primarily affecting customers in the Asia-Pacific region during peak usage. If unaddressed, we anticipate a 30% increase in abandoned transactions over the next hour, potentially impacting revenue by $500,000 today."
This personalised response showcases the AI's fundamental understanding of the code and its ability to contextualise the technical issue within the business landscape. By providing insights into user impact and financial implications, the AI equips the team to prioritise actions effectively.
Speeding up resolution with proactive suggestions and business coordination
Armed with precise technical and business information, the team quickly reverts the recent code change to stabilise the system. With a sense of urgency and teamwork they follow the AI’s guidance. The AI suggests an optimisation:
"Refactor the encryptTransaction function to utilise a more efficient encryption library, reducing processing time by 40%."
It provides code snippets and estimates the positive impact on transaction throughput as well.
Simultaneously, the AI generates an update for business stakeholders:
"We are experiencing increased transaction processing times due to a recent deployment. The technical team is implementing a fix expected to resolve the issue within 15 minutes. Current impact is a 15% delay in transactions for Asia-Pacific customers."
To ensure everyone is on the same page, it also references relevant Business Continuity Plans (BCP):
"If the issue persists beyond 30 minutes, consider initiating BCP Protocol 3, which includes notifying high-value clients and providing alternative payment options."
By offering these insights, the AI not only aids in technical resolution but also enhances communication between technical teams and business units, ensuring everyone is informed and prepared.
Reducing impact on the organisation and customers
Thanks to the AI's comprehensive analysis and proactive communication, the incident is resolved swiftly. Business stakeholders are kept in the loop and well informed, enabling them to manage customer expectations. The usual influx of support calls is minimised, preserving the company's reputation and customer trust. Potential financial losses are averted, and the operational team strengthens its collaboration and relationship with the business side.
Aggregated knowledge for operational and business excellence
The AI system aggregates data from logs, code repositories, performance metrics, integration points, network traffic, and business metrics such as transaction volumes and regional usage patterns, building a detailed understanding of the organisation’s operations. This holistic knowledge base allows it to detect anomalies and assess their broader impact on the organisation.
For example, if a service outage is detected, the AI can quickly determine which customer segments are affected, the potential financial implications, and suggest communication strategies aligned with the company's policies.
“Knowledge" being the key enabler is no foreign term to any IT Service Management (ITSM) practices. Over the past decade, the benefits of a robust Knowledge Management frameworks have been well established, with the DIKW (Data, Information, Knowledge, Wisdom) pyramid standing out as a catalyst. AI brings transformative solutions to nearly every challenge associated with Knowledge Management databases, from addressing outdated or redundant content and automating time-intensive updates to fostering contextual understanding and enabling practical application of knowledge.
The evolving role of operational teams
Operational professionals develop new skills in interpreting AI-generated insights, including business impact assessments. They become adept at cross-functional communication, ensuring that technical resolutions align with business objectives. This evolution enhances job satisfaction and contributes to the organisation's overall agility and responsiveness. Working alongside AI in this way gives team members a greater sense of achievement and value, while still going unnoticed for all the right reasons.
Conclusion
AI is ushering in a new era of operational excellence by providing deep, context-aware insights that bridge technical and business domains. By enhancing our ability to understand, prevent, and swiftly resolve incidents - with a clear view of user impact and business implications - AI reduces the overall impact on organisations and their customers.