Monitoring & Observability
Observability is the practice of understanding a system through metrics, logs, traces, and alerts. This section groups the tools that usually appear together in a production monitoring stack.
Recommended Split
- Metrics and alerting: Prometheus
- Dashboards and exploration: Grafana
- Log shipping and transformation: Fluentd
- Log storage and queries: Loki
- Distributed tracing: Jaeger and OpenTracing
- Managed observability: Datadog
- Legacy host and service monitoring: Nagios
- Packaging and delivery for Kubernetes apps: Replicated KOTS
Pages
How To Read This Section
- Start with Prometheus and Grafana if you want a self-hosted metrics stack.
- Add Loki and Fluentd when logs need to be collected and queried centrally.
- Use Jaeger or OpenTracing for request-level tracing across services.
- Use Datadog when you want a managed platform instead of running the stack yourself.
- Keep Nagios in mind for older environments that still rely on host and service checks.