Explaining fashionable server monitoring stacks for self-hosting
Written by Solène, on 11 September 2022.
Tags:
#nixos
Hey ????????, it has been a very long time I did not have to check out monitoring servers. I’ve arrange a Grafana server six years in the past, and I used to be utilizing Munin for my private servers.
Nonetheless, I just lately moved my server to a small digital machine which has CPU and reminiscence constraints (1 core / 1 GB of reminiscence), and Munin did not work very nicely. I used to be curious to be taught if the Grafana stack modified because the final time I used it, and YES.
There’s that challenge named Prometheus which is used completely in all places, it was time for me to find out about it. And as I prefer to go towards the circulate, I attempted numerous adjustments to the trade commonplace stack through the use of VictoriaMetrics.
On this article, I am utilizing NixOS configuration for the examples, nevertheless it needs to be apparent sufficient that you could nonetheless perceive the elements if you do not know something about NixOS.
VictoriaMetrics is a Prometheus drop-in substitute that’s much more environment friendly (quicker and use much less sources), which additionally supplies numerous API resembling Graphite or InfluxDB. It is the element storing knowledge. It comes with numerous applications like VictoriaMetrics agent to switch numerous elements of Prometheus.
VictoriaMetrics official website
Prometheus is a time collection database, which additionally present a gathering agent named Node Exporter. It is also in a position to pull (scrape) knowledge from distant providers providing a Prometheus API.
NixOS is an working system constructed with the Nix package deal supervisor, it has a declarative method that requires to reconfigure the system when it’s essential make a change.
Collectd is a agent gathering metrics from the system and sending it to a distant appropriate database.
Grafana is a strong Net interface pulling knowledge from time collection databases to render them beneath helpful charts for evaluation.
Node exporter full Grafana dashboard
On this setup, a Prometheus server is working on a server together with Grafana, and connects to distant servers working node_exporter to assemble knowledge.
Operating it on my server, Grafana takes 67 MB, the native node_exporter 12.5 MB and Prometheus 63 MB.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
grafana 837975 0.1 6.7 1384152 67836 ? Ssl 01:19 1:07 grafana-server
node-ex+ 953784 0.0 1.2 941292 12512 ? Ssl 16:24 0:01 node_exporter
prometh+ 983975 0.3 6.3 1226012 63284 ? Ssl 17:07 0:00 prometheus
- mannequin: pull, Prometheus is connecting to all servers
Professionals §
- it is the trade commonplace
- can use the “node exporter full” Grafana dashboard
Cons §
- makes use of reminiscence
- you want to have the ability to attain all of the distant nodes
Server §
{
providers.grafana.allow = true;
providers.prometheus.exporters.node.allow = true;
providers.prometheus = {
allow = true;
scrapeConfigs = [
{
job_name = "kikimora";
static_configs = [
{targets = ["10.43.43.2:9100"];}
];
}
{
job_name = "interbus";
static_configs = [
{targets = ["127.0.0.1:9100"];}
];
}
];
};
}
Shopper §
{
networking.firewall.allowedTCPPorts = [9100];
providers.prometheus.exporters.node.allow = true;
}
On this setup, a VictoriaMetrics server is working on a server together with Grafana. A VictoriaMetrics agent is working regionally to assemble knowledge from distant servers working node_exporter.
Operating it on my server, Grafana takes 67 MB, the native node_exporter 12.5 MB, VictoriaMetrics 30 MB and its agent 13.8 MB.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
grafana 837975 0.1 6.7 1384152 67836 ? Ssl 01:19 1:07 grafana-server
node-ex+ 953784 0.0 1.2 941292 12512 ? Ssl 16:24 0:01 node_exporter
victori+ 986126 0.1 3.0 1287016 30052 ? Ssl 18:00 0:03 victoria-metric
root 987944 0.0 1.3 1086276 13856 ? Sl 18:30 0:00 vmagent
- mannequin: pull, VictoriaMetrics agent is connecting to all servers
Professionals §
- can use the “node exporter full” Grafana dashboard
- light-weight and extra performant than Prometheus
Cons §
- you want to have the ability to attain all of the distant nodes
Server §
let
configure_prom = builtins.toFile "prometheus.yml" ''
scrape_configs:
- job_name: 'kikimora'
stream_parse: true
static_configs:
- targets:
- 10.43.43.1:9100
- job_name: 'interbus'
stream_parse: true
static_configs:
- targets:
- 127.0.0.1:9100
'';
in {
providers.victoriametrics.allow = true;
providers.grafana.allow = true;
systemd.providers.export-to-prometheus = {
path = with pkgs; [victoriametrics];
allow = true;
after = ["network-online.target"];
wantedBy = ["multi-user.target"];
script = "vmagent -promscrape.config=${configure_prom} -remoteWrite.url=http://127.0.0.1:8428/api/v1/write";
};
}
Shopper §
{
networking.firewall.allowedTCPPorts = [9100];
providers.prometheus.exporters.node.allow = true;
}
On this setup, a VictoriaMetrics server is working on a server together with Grafana, on every server node_exporter and VictoriaMetrics agent are working to export knowledge to the central VictoriaMetrics server.
Operating it on my server, Grafana takes 67 MB, the native node_exporter 12.5 MB, VictoriaMetrics 30 MB and its agent 13.8 MB, which is precisely the identical because the setup 2, besides the VictoriaMetrics agent is working on all distant servers.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
grafana 837975 0.1 6.7 1384152 67836 ? Ssl 01:19 1:07 grafana-server
node-ex+ 953784 0.0 1.2 941292 12512 ? Ssl 16:24 0:01 node_exporter
victori+ 986126 0.1 3.0 1287016 30052 ? Ssl 18:00 0:03 victoria-metric
root 987944 0.0 1.3 1086276 13856 ? Sl 18:30 0:00 vmagent
- mannequin: push, every agent is connecting to the VictoriaMetrics server
Professionals §
- can use the “node exporter full” Grafana dashboard
- reminiscence environment friendly
- can bypass firewalls simply
Cons §
- you want to have the ability to attain all of the distant nodes
- extra upkeep as you’ve one additional agent on every distant
- could also be dangerous for safety, it’s essential enable distant servers to write down to your VictoriaMetrics server
Server §
{
networking.firewall.allowedTCPPorts = [8428];
providers.victoriametrics.allow = true;
providers.grafana.allow = true;
providers.prometheus.exporters.node.allow = true;
}
Shopper §
let
configure_prom = builtins.toFile "prometheus.yml" ''
scrape_configs:
- job_name: '${config.networking.hostName}'
stream_parse: true
static_configs:
- targets:
- 127.0.0.1:9100
'';
in {
providers.prometheus.exporters.node.allow = true;
systemd.providers.export-to-prometheus = {
path = with pkgs; [victoriametrics];
allow = true;
after = ["network-online.target"];
wantedBy = ["multi-user.target"];
script = "vmagent -promscrape.config=${configure_prom} -remoteWrite.url=http://victoria-server.area:8428/api/v1/write";
};
}
On this setup, a VictoriaMetrics server is working on a server together with Grafana, servers are working Collectd sending knowledge to VictoriaMetrics graphite API.
Operating it on my server, Grafana takes 67 MB, VictoriaMetrics 30 MB and Collectd 172 kB (sure).
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
grafana 837975 0.1 6.7 1384152 67836 ? Ssl 01:19 1:07 grafana-server
victori+ 986126 0.1 3.0 1287016 30052 ? Ssl 18:00 0:03 victoria-metric
collectd 844275 0.0 0.0 610432 172 ? Ssl 02:07 0:00 collectd
- mannequin: push, VictoriaMetrics receives knowledge from the Collectd servers
Professionals §
- tremendous reminiscence environment friendly
- can bypass firewalls simply
Cons §
- you’ll be able to’t use the “node exporter full” Grafana dashboard
- could also be dangerous for safety, it’s essential enable distant servers to write down to your VictoriaMetrics server
- it’s essential configure Collectd for every host
Server §
The server requires VictoriaMetrics to run exposing its graphite API on ports 2003.
Observe that in Grafana, you’ll have to escape “-” characters utilizing “-” within the queries. I additionally did not discover a strategy to robotically uncover hosts within the knowledge to make use of variables within the dashboard.
UPDATE: Utilizing write_tsdb exporter in collectd, and exposing a TSDB API with VictoriaMetrics, you’ll be able to set a label to every host, after which use the question “label_values(standing)” in Grafana to automated uncover hosts.
{
networking.firewall.allowedTCPPorts = [2003];
providers.victoriametrics = {
allow = true;
extraOptions = [
"-graphiteListenAddr=:2003"
];
};
providers.grafana.allow = true;
}
Shopper §
We solely have to allow Collectd on the shopper:
{
providers.collectd = {
allow = true;
autoLoadPlugin = true;
extraConfig = ''
Interval 30
'';
plugins = {
"write_graphite" = ''
<Node "${config.networking.hostName}">
Host "victoria-server.fqdn"
Port "2003"
Protocol "tcp"
LogSendErrors true
Prefix "collectd_"
</Node>
'';
cpu = ''
ReportByCpu false
'';
reminiscence = "";
df = ''
Mountpoint "/"
Mountpoint "/nix/retailer"
Mountpoint "/residence"
ValuesPercentage True
ValuesAbsolute False
'';
load = "";
uptime = "";
swap = ''
ReportBytes false
ReportIO false
ValuesPercentage true
'';
interface = ''
ReportInactive false
'';
};
};
}
The primary part named #!/bin/introduction” is on objective and never a mistake. It felt tremendous enjoyable once I began writing the article, and wished to maintain it that method.
The Collectd setup is essentially the most minimalistic whereas nonetheless highly effective, but it surely requires lot of labor to make the dashboards and configure the plugins appropriately.
The setup I like finest is the setup 2.