Skip to main content

Introduction to Grafana and Observability

In this foundational lesson, we'll explore what Grafana is, why observability matters in modern software systems, and how Grafana helps teams monitor and understand their applications. By the end of this lesson, you'll understand the core concepts that make Grafana an essential tool for developers and operations teams.

Learning Goals:

  • Understand what Grafana is and its role in observability
  • Learn the three pillars of observability
  • Identify common use cases for Grafana
  • Recognize how Grafana fits into modern monitoring workflows

What is Grafana?

Grafana is an open-source platform for monitoring and observability that allows you to query, visualize, alert on, and understand your metrics no matter where they are stored. Think of it as the "window" into your systems - it doesn't store data itself but connects to various data sources to create rich, interactive dashboards.

Grafana Features
  • Data visualization through charts, graphs, and tables
  • Alerting and notification systems
  • Dashboard templating for reusability
  • Multi-data source support
  • Plugin ecosystem for extensibility
important

Grafana is often confused with being a database, but it's primarily a visualization tool. It connects to your existing data sources rather than storing data itself.

The Three Pillars of Observability

Observability in software systems is built on three fundamental pillars that help you understand what's happening inside your applications:

Metrics
Quantitative measurements that reflect the health, performance, and behavior of systems over time — like CPU usage, memory, latency, or request rate.
Logs
Detailed event records that capture what happened and when. Useful for debugging and auditing specific system actions or failures.
Traces
End-to-end request tracking across distributed services, showing how a request flows and where latency or errors occur.

Metrics

Metrics are numerical measurements collected over time. They help you track system performance, resource usage, and business KPIs.

Example Metric Structure
interface Metric {
name: string; // e.g., "http_requests_total"
value: number; // e.g., 1500
timestamp: Date; // When the measurement was taken
labels: { // Key-value pairs for filtering
method: string; // "GET", "POST", etc.
status: string; // "200", "404", "500"
endpoint: string; // "/api/users", "/health"
};
}

Logs

Logs are timestamped records of discrete events that occurred in your system. They provide detailed context about specific operations.

Example Log Entry - TraditionalLog format
2024-01-15T10:29:58Z INFO  user-service  Starting request for /api/login user_id=user-456 trace_id=abc-123-xyz
2024-01-15T10:29:59Z DEBUG user-service Querying database for user credentials trace_id=abc-123-xyz
2024-01-15T10:30:00Z WARN user-service Slow response from database host=db-prod-1 duration=4800ms trace_id=abc-123-xyz
2024-01-15T10:30:00Z ERROR user-service Database connection timeout after 5000ms user_id=user-456 trace_id=abc-123-xyz
2024-01-15T10:30:00Z INFO gateway Request completed status=500 latency=5050ms trace_id=abc-123-xyz
Example Log Entry - Json Log Format
{
"timestamp": "2024-01-15T10:30:00Z",
"level": "ERROR",
"message": "Database connection timeout",
"service": "user-service",
"trace_id": "abc-123-xyz",
"user_id": "user-456",
"duration_ms": 5000
}

Traces

Traces track the journey of a request as it propagates through multiple services in a distributed system.

Trace Components
trace:
trace_id: "trace-789"
spans:
- span_id: "span-1"
operation: "HTTP GET /api/orders"
start_time: "2024-01-15T10:30:00Z"
duration: 150ms
service: "api-gateway"
- span_id: "span-2"
operation: "database.query"
start_time: "2024-01-15T10:30:00.100Z"
duration: 50ms
service: "order-service"

Why Grafana for Observability?

Grafana excels at bringing together all three pillars of observability into a unified interface. Here's how it addresses common monitoring challenges:

Multiple Data Sources in One Dashboard
// Grafana can display data from:
- Prometheus (metrics)
- Loki (logs)
- Tempo (traces)
- Elasticsearch
- MySQL/PostgreSQL
- Cloud monitoring services
- Custom data sources

Common Grafana Use Cases

Infrastructure & Cloud Monitoring
Monitor servers, containers, and Kubernetes clusters using Prometheus, Node Exporter, and Cloud integrations.
Application Performance & Tracing
Correlate metrics, logs, and traces using Tempo and OpenTelemetry for deep visibility into service health.
Log Aggregation & Analysis
Collect, query, and visualize logs via Grafana Loki — correlate events across multiple services easily.
Real-time Alerting & OnCall
Set unified alert rules, contact points, and on-call schedules with Grafana Alerting and OnCall.
Business & Product Analytics
Use SQL data sources or BigQuery to visualize KPIs, user engagement, and operational metrics together.
Frontend & User Experience Monitoring
Leverage Grafana Faro to track web vitals, session errors, and performance directly from your frontend apps.

How Grafana Fits in Your Toolchain

Grafana typically sits at the visualization layer of your monitoring stack:

Your Applications

[ Data Collection: Prometheus, Telegraf, Fluentd ]

[ Data Storage: Time series DBs, Log DBs ]

[ Grafana - Visualization & Alerting ]

Your Team (Devs, Ops, Business)
note

While Grafana works excellently with the Prometheus-Loki-Tempo stack, it's data source agnostic. You can connect it to virtually any time-series database, SQL database, or cloud monitoring service.

Common Pitfalls

  • Assuming Grafana stores data: Remember, Grafana visualizes data from external sources but doesn't store it long-term
  • Over-complicating dashboards: Start simple with key metrics before building complex visualizations
  • Ignoring data retention: Your underlying data sources (like Prometheus) have retention policies that affect historical data availability
  • Poor alert design: Avoid alert fatigue by setting meaningful thresholds and grouping related alerts
  • Security oversights: Always secure your Grafana instance and use appropriate data source permissions

Summary

In this lesson, we've covered the fundamentals of Grafana and observability. You now understand that Grafana is a powerful visualization tool that connects to various data sources to provide insights through metrics, logs, and traces. The three pillars of observability work together to give you comprehensive visibility into your systems, and Grafana excels at bringing these elements together in meaningful ways.

Remember

Grafana doesn't store your data - it helps you understand it. This distinction is crucial as we move forward in building effective monitoring solutions.

Introduction to Observability and Grafana – Quick Check

What are the three pillars of observability?

Question 1/5