One of the primary responsibilities of a Security Operation Center (SOC) is to effectively manage issues related to monitoring the security perimeter. This involves the meticulous analysis of alerts, the creation of subsequent cases, and if necessary, the escalation of incidents to the client through ticketing systems or, in some cases, the closure of incidents marked as false positives (FP).
However, this raises questions like: How is this process orchestrated? How does it ensure issues are resolved promptly? How can an SOC analyst efficiently handle all those triggered alerts?
That’s where Service Level Agreements (SLAs) play a key role.
A Service Level Agreement represents a formal commitment that an SOC follows to execute specific tasks, such as investigating or mitigating cases, within a specified time frame. These agreements establish a clear framework of expectations and responsibilities for both parties involved (SOC and client) taking into consideration aspects like availability, performance, and service support. Their primary purpose is to provide customers with a transparent understanding of the level of service they will receive. At the same time, they help SOC analysts in prioritizing alerts based on their priority and severity.
As I mentioned earlier, analysts manage an alert differently based on its priority and severity. The severity describes the level of impact on a specific service or infrastructure element, while priority relates to incidents and establish the order for addressing them.
SLAs identify the following levels of urgency:
Before diving into the standards and their how they function, let me provide you with an essential brief overview of the process that every SOC analyst follows for alert management.
Firstly, when a detection rule is triggered, an alert is generated. Then, an analyst takes care of the alert by creating a case for it and attaching evidence to inform others that they are actively handling that alert, and that the analysis phase is currently in progress.
Once the analysis is complete and all relevant information has been gathered, the case enters the last phase. Analysts can open it, informing the end customer about the incident and suggesting remediation steps if needed. Alternatively they close it as a false positive without escalating it to the customer.
At this point, we can notice that there are three fundamental timestamps for calculating SLAs:
These three elements together allow us to identify the Initial Response Time and the Process Time. The first one represents the time between the alert triggering in the SIEM and when an SOC analyst starts the investigation. Second is the duration during which an analyst conducts the investigation following an alert. This period begins when the analyst starts the investigation and concludes with either an escalation to the customer (via a ticket, phone call, or other agreed notification) or the closure of the incident as a false positive.
Analysts have 30 minutes to conduct a brief analysis of the alert and determine whether opening a new case is necessary or if it links to a previous one. After making this initial decision, they begin the real in-depth analysis. Depending on the SLA, there is a limited amount of time to escalate the ticket to the client or close it as an FP.
In conclusion well-crafted and effectively implemented SLAs can deliver substantial benefits to both customers and users. SLAs play a crucial role in enhancing customer satisfaction by establishing precise service benchmarks and performance criteria.
When SOCs consistently meet or exceed their SLAs, they not only elevate the overall customer experience, but also increase trust and loyalty. This level of transparency plays a vital role in managing customer expectations and ensuring mutual alignment regarding service levels.
Additionally, SLAs often address critical aspects such as service availability, downtime, and disaster recovery protocols. This ensures that businesses are well-prepared with recovery plans to minimize service interruptions, address technical challenges, and recover from incidents, ensuring business continuity.
Did you find this article interesting? Does it match your skill set? Our customers often present us with problems that need customized solutions. In fact, we’re currently hiring for roles just like this and others here at Würth Phoenix.