28. 12. 2023 Davide Gallo Cloud, ITOA, NetEye

Using Jinja2 to Automate Configuration Files

As you may know, NetEye Cloud is our multi-tenant SaaS solution for monitoring your infrastructure. It’s crucial to us for keeping every tenant aligned with the latest configurations and patches. We’ve managed to automate and align the agents via Desired State Configuration (DSC) and Ansible, but we still had to manually check those agents’ configurations. Luckily, there’s a tool that helps tremendously in automating these configurations: Jinja2.

Requirements

None, the tool is already available in your NetEye Master/Satellite installation.

Use case

We need to collect metrics via telegraf on ServerIIS and ServerDB. Each server has different performance counters to be collected, as they have different applications installed. The telegraf agents will connect to the host on which we’re running this Ansible playbook.

Set up your environment

Create an inventory

For this use case we need an easy inventory.yml like this:

all:
  hosts:
    ServerIIS.lab.local:
      telegraf_role: IIS
    ServerDB.lab.local:
      telegraf_role: MSSQL

Create a telegraf template

We need to create a templates folder, then inside this we create a telegraf.j2 file.

NOTE: In Jinja, code is always between single curly brackets. To print a variable, double curly brackets are used instead.

[agent]
  interval = "5s"
  round_interval = true
  metric_buffer_limit = 1000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  debug = false
  quiet = true
  hostname = ""

[global_tags]

###############################################################################
#                                  OUTPUTS                                    #
###############################################################################

[[outputs.nats]]
  servers = ["nats://{{ nats_server }}:4222"]
  subject = "telegraf.metrics"
  data_format = "influx"
  secure = true

#  ## TLS Config
  tls_ca = "c:\\Program Files\\telegraf\\certs\\root-ca.crt"
  tls_cert = "c:\\Program Files\\telegraf\\certs\\telegraf-agent.crt.pem"
  tls_key = "c:\\Program Files\\telegraf\\certs\\private\\telegraf-agent.key.pem"

###############################################################################
#                                  INPUTS                                     #
###############################################################################

################# Monitor telegraf itself #################
[[inputs.win_perf_counters]]
    UseWildcardsExpansion = true
    LocalizeWildcardsExpansion = false

[[inputs.win_perf_counters.object]]
    ObjectName = "Process"
    Counters = ["% Processor Time","% Privileged Time","Handle Count","Thread Count","Page File Bytes","Working Set","Working Set - Private","IO Read Bytes/sec","IO Write Bytes/sec","ID Process"]
    Instances = ["telegraf"]
    Measurement = "agent"

############################################################################################
################# START WIN #################
[[inputs.win_perf_counters.object]]
   ObjectName = "Memory"
   Counters = ["Available KBytes","Commit Limit","Committed Bytes","Page Faults/sec","Page Reads/sec","Page Writes/sec","Pages Input/sec","Pages Output/sec","Pages/sec","Pool Nonpaged Bytes","Pool Paged Bytes","Standby Cache Reserve Bytes","System Cache Resident Bytes"]
   Instances = ["------"]
   Measurement = "Memory"

[[inputs.win_perf_counters.object]]
   ObjectName = "Network Interface"
   Counters = ["Bytes Received/sec","Bytes Sent/sec","Bytes Total/sec","Current Bandwidth","Offloaded Connections","Output Queue Length"]
   Instances = ["*"]
   Measurement = "Network_Interface"
   IncludeTotal = true

[[inputs.win_perf_counters.object]]
   ObjectName = "Paging File"
   Counters = ["% Usage","% Usage Peak"]
   Instances = ["*"]
   Measurement = "Paging_File"
   IncludeTotal = true

[[inputs.win_perf_counters.object]]
   ObjectName = "PhysicalDisk"
   Counters = ["Avg. Disk Bytes/Read","Avg. Disk Bytes/Transfer","Avg. Disk Bytes/Write","Avg. Disk Queue Length","Avg. Disk Write Queue Length","Avg. Disk sec/Read","Avg. Disk sec/Write","Disk Read Bytes/sec","Disk Reads/sec","Disk Write Bytes/sec","Disk Writes/sec"]
   Instances = ["*"]
   Measurement = "PhysicalDisk"
   IncludeTotal = true

[[inputs.win_perf_counters.object]]
   ObjectName = "Processor"
   Counters = ["% Privileged Time","% Processor Time"]
   Instances = ["*"]
   Measurement = "Processor"
   IncludeTotal = true

[[inputs.win_perf_counters.object]]
   ObjectName = "System"
   Counters = ["Context Switches/sec","Processes","Processor Queue Length","Threads"]
   Instances = ["------"]
   Measurement = "System"

################# END WIN #################

{% if telegraf_role == 'IIS' %}
################# START iis ###################

[[inputs.win_perf_counters.object]]
   ObjectName = "HTTP Service Request Queues"
   Counters = ["ArrivalRate","CacheHitRate","CurrentQueueSize","MaxQueueItemAge","RejectedRequests","RejectionRate"]
   Instances = ["*"]
   Measurement = "HTTP_Service_Request_Queues"
   IncludeTotal = true

[[inputs.win_perf_counters.object]]
   ObjectName = "HTTP Service Url Groups"
   Counters = ["BytesReceivedRate","BytesSentRate","BytesTransferredRate","ConnectionAttempts","CurrentConnections","GetRequests","HeadRequests","MaxConnections"]
   Instances = ["*"]
   Measurement = "HTTP_Service_Url_Groups"
   IncludeTotal = true

[[inputs.win_perf_counters.object]]
   ObjectName = "HTTP Service"
   Counters = ["CurrentUrisCached","TotalFlushedUris","TotalUrisCached","UriCacheFlushes","UriCacheHits","UriCacheMisses"]
   Instances = ["------"]
   Measurement = "HTTP_Service"

################# END iis ###################
{% endif %}

{% if telegraf_role == 'MSSQL' %}
################# START SQL #################
[[inputs.win_perf_counters.object]]
   ObjectName = "SQLAgent:JobSteps"
   Counters = ["Active steps","Total step retries"]
   Instances = ["*"]
   Measurement = "JobSteps"
   IncludeTotal = true

[[inputs.win_perf_counters.object]]
   ObjectName = "SQLAgent:Jobs"
   Counters = ["Active jobs","Failed jobs","Jobs activated/minute"]
   Instances = ["*"]
   Measurement = "Jobs"
   IncludeTotal = true

[[inputs.win_perf_counters.object]]
   ObjectName = "SQLAgent:SystemJobs"
   Counters = ["Active system jobs"]
   Instances = ["*"]
   Measurement = "SystemJobs"
   IncludeTotal = true

################# END SQL  #################
{% endif %}

Create the playbook

As mentioned in the use case, the agent will connect to the current host we are running the playbook from. Let’s create a playbook.yml

- name: Create telegraf configurations
  hosts: all
  connection: local
  vars:
    nats_server: "{{ hostvars['localhost'].ansible_nodename}}"
  tasks:
    - name: Gather facts from localhost for later use
      setup:
      delegate_to: "localhost"
      delegate_facts: true

    - name: create tmp folders
      file:
        path: "output/{{inventory_hostname}}/"
        state: directory
      delegate_to: localhost


    - name: "generate conf locally"
      template:
        src: templates/telegraf.j2
        dest: "output/{{ inventory_hostname }}/telegraf.conf"
      delegate_to: localhost

Run the playbook

ansible-playbook playbook.yml -i inventory.yml

Now you’ll find the templates in the newly created output folder.

Conclusion

With this simple playbook and template, we now have the telegraf configurations to install on the servers. It seems like a lot of effort for only two servers, but the possibilities are endless when you have to keep huge system environments up-to-date with the latest configurations! Furthermore, it’s possible to generate and install telegraf directly from the NetEye servers, but that’s a topic for another blog 😊.

Sources

How to manage Apache web servers using Jinja2 templates and filters | Enable Sysadmin (redhat.com)

How to build your inventory — Ansible Documentation

These Solutions are Engineered by Humans

Did you find this article interesting? Are you an “under the hood” kind of person? We’re really big on automation and we’re always looking for people in a similar vein to fill roles like this one as well as other roles here at Würth Phoenix.

Davide Gallo

Davide Gallo

Site Reliability Engineer at Würth Phoenix

Author

Davide Gallo

Leave a Reply

Your email address will not be published. Required fields are marked *

Archive