Monitoring & Observability

Observability is the practice of understanding a system through metrics, logs, traces, and alerts. This section groups the tools that usually appear together in a production monitoring stack.

Recommended Split

Metrics and alerting: Prometheus
Dashboards and exploration: Grafana
Log shipping and transformation: Fluentd
Log storage and queries: Loki
Distributed tracing: Jaeger and OpenTracing
Managed observability: Datadog
Legacy host and service monitoring: Nagios
Packaging and delivery for Kubernetes apps: Replicated KOTS

Pages

How To Read This Section

Start with Prometheus and Grafana if you want a self-hosted metrics stack.
Add Loki and Fluentd when logs need to be collected and queried centrally.
Use Jaeger or OpenTracing for request-level tracing across services.
Use Datadog when you want a managed platform instead of running the stack yourself.
Keep Nagios in mind for older environments that still rely on host and service checks.