Understanding Grafana Query Languages
Now that you've connected your data sources to Grafana, it's time to master the most critical skill for effective monitoring: writing queries. In this lesson, you'll learn how to communicate with your data sources using Grafana's query languages to extract exactly the information you need.
Learning Goals:
- Understand the role of query languages in Grafana
- Learn the basics of PromQL for Prometheus data sources
- Master essential SQL patterns for database queries
- Explore Loki's LogQL for log analysis
- Write effective queries for different visualization types
What Are Grafana Query Languages?
Grafana doesn't have its own query language. Instead, it acts as a translator that speaks the native language of your data source. When you select a data source in a panel, Grafana provides the appropriate query editor for that specific system.
Think of Grafana as a universal remote control for your monitoring data. Each data source has its own "language," and Grafana helps you speak all of them through a consistent interface.
Prometheus Query Language (PromQL)
Prometheus is one of Grafana's most common data sources, and PromQL is essential for time series data analysis.
Basic PromQL Syntax
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
This query calculates CPU usage percentage by:
- Selecting idle CPU time
- Calculating the rate over 5 minutes
- Converting to percentage
- Grouping by instance
Common PromQL Functions
# Current memory usage
node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes
# Memory usage percentage
(
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)
/ node_memory_MemTotal_bytes
) * 100
# 5-minute average memory usage
avg_over_time(
(
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)
/ node_memory_MemTotal_bytes
)[5m:]
) * 100
SQL for Database Data Sources
For SQL databases like PostgreSQL, MySQL, or ClickHouse, you write standard SQL queries with time series considerations.
Time Series SQL Patterns
- PostgreSQL
- MySQL
SELECT
$__timeGroup("timestamp", '1h'),
percentile_cont(0.95) WITHIN GROUP (ORDER BY response_time_ms) as p95_response_time,
avg(response_time_ms) as avg_response_time
FROM application_metrics
WHERE
$__timeFilter("timestamp")
AND status_code = 200
GROUP BY 1
ORDER BY 1
SELECT
$__timeGroup(created_at, '5m'),
COUNT(*) as connection_count,
AVG(duration_ms) as avg_duration
FROM database_connections
WHERE
$__timeFilter(created_at)
AND status = 'success'
GROUP BY 1
ORDER BY 1
Grafana provides macros like $__timeFilter() and $__timeGroup() that automatically adapt to your dashboard's time range. These make your SQL queries time-range aware without manual date manipulation.
Loki's LogQL for Log Analysis
Loki uses LogQL, which combines log stream selection with aggregation operations.
Log Stream Selection
{container="api-service", environment="production"} |= "error"
This selects logs where:
- Container name is "api-service"
- Environment is "production"
- Log line contains "error"
Log Aggregations
sum by (container) (
rate(
{container=~"api-service|auth-service"}
|= "error"
| json
| status >= 500
[5m])
)
This query:
- Selects error logs from multiple services
- Parses JSON log structure
- Filters for HTTP 500 errors
- Calculates error rate per container
Query Variables and Templating
You can use queries to populate dashboard variables, making your dashboards dynamic.
SELECT DISTINCT service_name
FROM metrics
WHERE environment = 'production'
ORDER BY service_name
label_values(node_cpu_seconds_total, instance)
Query Optimization Tips
Reduce Data Points
# Instead of high resolution
rate(http_requests_total[1m])
# Use resolution matching your needs
rate(http_requests_total[5m])
Use Efficient Time Ranges
# For real-time monitoring (last hour)
rate(http_requests_total[1m])
# For trend analysis (last week)
rate(http_requests_total[15m])
Common Pitfalls
- Too broad label matchers: Using
{job=~".*"}can overload your data source - Ignoring resolution: Querying 1-second intervals for weekly trends wastes resources
- Missing error handling: Not accounting for missing metrics or null values
- Over-aggregation: Losing important details by averaging across too many dimensions
- Hard-coded time ranges: Not using Grafana's time macros in SQL queries
Summary
You've learned that Grafana leverages the native query languages of your data sources. PromQL excels at time series calculations, SQL handles relational data with time series extensions, and LogQL specializes in log analysis. The key to effective querying is understanding your data source's capabilities and writing efficient, focused queries that provide the insights you need for visualization.
Quiz
Show quiz
-
What happens when you switch data sources in a Grafana panel?
- A) Your existing query automatically converts to the new data source's language
- B) The query editor changes to match the new data source's language
- C) Grafana translates your query using AI
- D) The panel becomes empty and you must start over
-
Which PromQL function calculates the per-second rate of change over a time window?
- A)
increase() - B)
delta() - C)
rate() - D)
deriv()
- A)
-
What is the purpose of Grafana's SQL macros like
$__timeFilter()?- A) To make SQL queries faster
- B) To automatically filter results based on the dashboard's time range
- C) To convert SQL to PromQL
- D) To cache query results
-
In LogQL, what does the
|= "error"filter accomplish?- A) Excludes log lines containing "error"
- B) Selects log lines where the JSON field "error" exists
- C) Filters for log lines containing the string "error"
- D) Counts the number of errors
-
Why should you avoid using very short time ranges (like [1m]) for long-term trend analysis?
- A) It's not supported by Prometheus
- B) It provides too many data points and wastes resources
- C) It makes the graphs too detailed
- D) Grafana limits query resolution
Answers:
- B) The query editor changes to match the new data source's language
- C)
rate() - B) To automatically filter results based on the dashboard's time range
- C) Filters for log lines containing the string "error"
- B) It provides too many data points and wastes resources