27. 05. 2014 Andrea di Lernia NetEye

Solve the incident, analyze the problem with NetEye

Sometimes in the organizations where the IT support is involved both in the Service Desk and in the new project activities, it is not taken into account the difference between an incident and a problem. In most cases, in fact, when there is an incident (that can be often solved by a sequence of commands from a system administrator) we usually focus too much in seeking the cause of the problem to remove it permanently when instead a Service Desk should restore the service as soon as possible by providing, if necessary, a workaround.

A fast recovery of an IT service is a priority especially if a number of users / clients are impacted by the outage. The analysis process to identify the root cause of the problem may be performed in parallel or in a second phase during the problem management process.

In certain cases it is also possible that the workaround is integrated into a monitoring system able to identify the malfunctioning and to use the provided workarounds in order to restore automatically the service. If this occurs during the time in which the users / customers do not use the IT service, the service can be restored without any impact. This ensures a higher user satisfaction and at the same time more tranquility for the IT department.

Let’s see a typical example:

It happened that our DFS service on the Windows file server crashed. Of course for the Murphy’s Law this always happens over the weekend so that the result on Monday morning was that no one was able to access the files on the network until the system administrator on duty did not restart the service.

System engineers have analyzed the problem during the week: the service was crashing but it was not restarting automatically cause to a dependency with the Remote Registry that was set to DISABLED; they were not able, however, to find the real root cause of the problem.

On the following Monday we had a ” Deja vu “, the DFS service was crashing again… this is what happens when you mix the roles of the Service Desk / Incident Management and Problem Management.

What we have done:

We have introduced a check on NetEye, our monitoring system, to verify the correct status of the DFS service and we have also created an automated procedure to reset the Remote Registry service on the auto_start state and to restart the DFS service.

This procedure has been linked to the same NetEye control so that it can be automatically executed in case of error.

This procedure has also been made available through the NetEye Action Launchpad to the administration department of our company, that has no IT skills (who are the first starting to work in the company in the early morning). With this tool they can independently solve the problem in case the automatic procedure fails.

Now that the incident is closed in the right way, we can concentrate ourselves on the problem and in the case the incident will happen again and the monitoring system is not able to solve it automatically, we have also introduced a self-service solution for the administration department. This solution avoids the need to ensure the presence of the IT support in the company already at 7 am when a certain number of users starts to work and allows us to have more time to be dedicated to the problem analysis.

Andrea di Lernia

Profit Center Manager at Würth Phoenix

Hi everybody, I’m Andrea and my contribution to this blog is to give hints of the monitoring issue from an IT manager point of view. I was born in Bolzano in 1965 and my professional path started 25 years ago operating on the technical field as programmer, system/database administrator, network engineer, consultancy and so on. I’ve been living in Milan for 10 years working for multinational IT companies and I decided to return to Bolzano after my marriage and the birth of my daughter. I love sailing and diving in the summer, skiing in the winter and travelling off-road with my Landcruiser anytime

Author

Andrea di Lernia

Latest posts by Andrea di Lernia

02. 09. 2015 NetEye

Monitoring the Quality of VoIP Communications Using IP SLA

02. 09. 2015 NetEye

Monitoraggio delle comunicazioni VoIP tramite IP SLA

02. 09. 2015 NetEye

VoIP-Monitoring mit IP SLA

06. 05. 2015 Garante della Privacy, NetEye

Archiviazione dei log e poi?

06. 05. 2015 Log Management, NetEye

What to do with all those logs?

See All

Solve the incident, analyze the problem with NetEye

Andrea di Lernia

Author

Andrea di Lernia

Latest posts by Andrea di Lernia

Leave a Reply Cancel reply

Search by technology

Contact

Subscribe to blog

Categories

Recent posts

Archive

Solve the incident, analyze the problem with NetEye

Andrea di Lernia

Author

Andrea di Lernia

Latest posts by Andrea di Lernia

Related Content

From Elastic Observability to NetEye: Alerting with Tornado

Bug Fixes for NetEye 4.47

Bug Fixes for NetEye 4.47

Bug Fixes for NetEye 4.47

Sending OTel Data to Elasticsearch: Tenant Segregation through OAuth

Leave a Reply Cancel reply

Search by technology

Contact

Subscribe to blog

Categories

Recent posts

Archive