Monitoring / Metrics with Prometheus
We are using Zabbix 5.0 (lts) server with PostgreSQL database. Starting with manual configuration in a test vm and then automating it for for deployment, Ansible roles zabbix-server and zabbix-agent are to results of this PoC work. Please follow FAQ to see how to access staging deployment of zabbix.
zabbix-server
This role is ready at the base level but as the complexity of the monitoring increases, more work would be needed. At the current level, it
Installs needed packages for server
configure zabbix, apache and PostgreSQL configuration files
configures web UI
configures kerberos authentication
While these basic things are good for POC, they are not ready to be in production until we have configured the following
add inventory files for groups and users and have zabbix-cli restore those in case of a fresh installation
Network config audit (see common challenges)
zabbix-agent
This role is ready to be used and existing templates are good to gather basic information. Though specific of what kind of common data would be collected from all agent nodes needs to be discussed widely and set in template. Other than common metrics, one can also export custom metrics using zabbix-sender (see FAQ).
Common challenges
Lack of experience in selinux policies and network configuration, we are not very confident with those. A veteran sysadmin would be needed audit.