Monitoring Ollama in NetEye with ollama-metrics and check_prometheus
Running Ollama locally or on dedicated hardware is straightforward until you need to know whether a model is actually loaded in RAM, how fast it generates tokens under load, or when memory consumption reaches a threshold that affects other workloads. A simple TCP port check confirms the process is alive, nothing more. This tutorial shows…
Read More