Before making decisions it’s good practice to gather data. Important data, I should say. In this post I will propose how to use Telegraf to gather that data. Telegraf is open source software that allows you to gather raw data (metrics) from your configured plugins and then forward them to the destination of your choice (with the output also determined by how you configure your plugins).
In brief, Telegraf is based on input and output plugins. The input plugins collect data from various sources such as local counters (e.g., memory or CPU usage), while output plugins send these measurements towards the desired destinations such as as time series database like Influx.
Besides local data, Telegraf can also collect remote data, for instance with the SNMP plugin (https://github.com/influxdata/telegraf/tree/master/plugins/inputs/snmp). In this case the server where Telegraf is installed is transformed into a monitoring “satellite”.
Consider the use case when you have hundreds of routers where you want to measure network traffic from individual interfaces along with their state. To resolve this problem you can use the classic functionality of NetEye, or else use Telegraf and Grafana, the former to gather data and the latter to visualize it.
So all we need to do is configure an input section in Telegraf where we specify the router to check, the Community string, and how often to collect measurements. Next is the section that indicates exactly which metrics we want to monitor. Here is an example configuration:
[name@IP ~]# cat /etc/telegraf/telegraf.d/router.domain.local.conf
agents = [ “IP” ]
version = 2
community = “public”
interval = “60s”
timeout = “5s”
retries = 3
name = “hostname”
oid = “RFC1213-MIB::sysName.0”
is_tag = true
name = “uptime”
oid = “DISMAN-EVENT-MIB::sysUpTimeInstance”
# IF-MIB::ifTable contains counters on input and output traffic as well as errors and discards.
name = “interface”
inherit_tags = [ “hostname” ]
oid = “IF-MIB::ifTable”
# Interface tag – used to identify interface in metrics database
name = “ifDescr”
oid = “IF-MIB::ifDescr”
is_tag = true
When the Telegraf service restarts, all the data will be available in Influx and can be used to create dashboards with all the requested network traffic information, availability, and errors that could be extracted from the router.
Do you have routers that don’t have SNMP enabled? Then try Telegraf’s PING plugin and you’ll even have the percentages of errors on the line!
So measure, measure, measure. And the next step? Visualize it!