https://media.ccc.de/v/all-systems-go...
How do you continually test and release new versions of systemd with confidence? Also, once released, how do you monitor PID 1 itself and your PID 1 usage across your server fleet? This talk dives into Meta’s way of answering these questions so we can minimize the risk of breaking changes and fun each systemd release brings us. Some of the technology in the talk is OSS, so you too, can join in on the fun knowing how your systemd usage is across your own infrastructure!
This talk will dive into how Meta baseline’s our systemd usage across the fleet and use that data for CI, releasing and monitoring systemd.
Who am I + what do I work on
The common big monitoring hole many bare bone infrastructures have
PID 1
PID 1 usage
Systemd @ meta
Imaging initrd
Initrd
Main os
Twine containers
Overview of OS image building and deployment @ meta
How we build images
How we provision servers
Chef’s role
What we check from our PID1 statistics to ensure a box is “healthy” enough to take workloads
Usage of hyperscale’s systemd-cd @ meta
What is systemd-cd
[https://sigs.centos.org/hyperscale/in...](https://sigs.centos.org/hyperscale/in...)
How do we use it
What issues has it found for us
Monitoring of meta’s systemd usage across the millions of hosts
Stats collected
Introduce monitord
Dbus (fun) vs. varlink
mention OSS alternative(s) found - explain why invented monitord
Introduce monitord-exporter
Show usage outside of meta (will be my small home infra + VPS’s)
Cooper Ry Lees
https://cfp.all-systems-go.io/all-sys...
#asg2024
Licensed to the public under https://creativecommons.org/licenses/...