Monitoring / Metrics with Prometheus

We are using Zabbix 5.0 (lts) server with PostgreSQL database. Starting with manual configuration in a test vm and then automating it for for deployment, Ansible roles zabbix-server and zabbix-agent are to results of this PoC work. Please follow FAQ to see how to access staging deployment of zabbix.

zabbix-server

This role is ready at the base level but as the complexity of the monitoring increases, more work would be needed. At the current level, it

  • Installs needed packages for server

  • configure zabbix, apache and PostgreSQL configuration files

  • configures web UI

  • configures kerberos authentication

While these basic things are good for POC, they are not ready to be in production until we have configured the following

  • add inventory files for groups and users and have zabbix-cli restore those in case of a fresh installation

  • Network config audit (see common challenges)

zabbix-agent

This role is ready to be used and existing templates are good to gather basic information. Though specific of what kind of common data would be collected from all agent nodes needs to be discussed widely and set in template. Other than common metrics, one can also export custom metrics using zabbix-sender (see FAQ).

Common challenges

Lack of experience in selinux policies and network configuration, we are not very confident with those. A veteran sysadmin would be needed audit.