top of page

Grafana Dashboard with Prometheus: The Ultimate Guide to Monitoring Your Infrastructure in 2025

In 2025, observability is no longer optional — it's a core requirement for every production system. Whether you're running microservices on Kubernetes, managing bare-metal servers, or operating a hybrid cloud, you need real-time visibility into your infrastructure. That's exactly where the Prometheus and Grafana stack comes in. Together, they form the most powerful open-source monitoring solution available today, trusted by companies like Uber, DigitalOcean, GitLab, and thousands of DevOps teams worldwide.

In this complete guide, you'll learn how to set up Prometheus to scrape metrics, build powerful Grafana dashboards, write PromQL queries, configure alerting rules, and monitor your Kubernetes cluster — all from scratch. Whether you're a beginner or an experienced SRE, this tutorial has something for you.

What Is Prometheus and Why Use It With Grafana?

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud and now a graduated CNCF project. It works by scraping HTTP endpoints that expose metrics in a text-based format, storing them in a time-series database, and making them queryable via its powerful PromQL language.

Grafana is an open-source analytics and monitoring platform that connects to dozens of data sources — including Prometheus — and lets you build beautiful, interactive dashboards. While Prometheus handles data collection and storage, Grafana handles visualization. Together they cover the complete monitoring lifecycle: collection, storage, alerting, and visualization.

Key reasons to choose this stack in 2025: it's completely free and open source, it scales from a single server to thousands of nodes, it has a massive ecosystem of exporters, and it natively integrates with Kubernetes, Docker, AWS CloudWatch, and hundreds of other platforms.

Prometheus Architecture: How It Works

Understanding Prometheus architecture is critical before diving into setup. The core components are: the Prometheus Server (which scrapes and stores metrics), Exporters (which expose metrics from services like Linux nodes, MySQL, Redis, Nginx), Alertmanager (which handles routing and silencing of alerts), and Pushgateway (for short-lived batch jobs that can't be scraped).

Prometheus follows a pull model — instead of services pushing metrics to a central server, Prometheus periodically polls each configured target's /metrics endpoint. This design makes it easy to discover and monitor targets dynamically, especially in cloud-native environments where services come and go constantly.

Step 1: Installing Prometheus

The fastest way to get started is with Docker Compose. Create a docker-compose.yml file with a Prometheus service pointing to your prometheus.yml config, and a Grafana service on port 3000. In your prometheus.yml, define the scrape interval (typically 15s), set external labels like environment and region, and list your scrape targets.

For production deployments on Kubernetes, the recommended approach is the kube-prometheus-stack Helm chart, which bundles Prometheus Operator, Grafana, node-exporter, kube-state-metrics, and AlertManager in a single, production-ready installation. Simply add the prometheus-community Helm repo and install the chart with your custom values file.

Key configuration options you should set from day one: retention time (default 15 days, increase for long-term trends), storage path, memory limits (Prometheus can be memory-hungry at scale), and remote_write if you want to send metrics to a long-term storage backend like Thanos, Cortex, or Victoria Metrics.

Step 2: Setting Up Exporters

Exporters are the bridge between your services and Prometheus. The most important exporters for 2025 are: Node Exporter (for Linux server metrics like CPU, memory, disk, network), cAdvisor (for Docker container metrics), kube-state-metrics (for Kubernetes object state), Blackbox Exporter (for HTTP/HTTPS/TCP uptime checks), MySQL Exporter, Redis Exporter, and NGINX Exporter.

Node Exporter is the first exporter you should install on every Linux host. Once running, it exposes over 900 metrics at the /metrics endpoint on port 9100. Add each host's IP and port to your Prometheus scrape config, and within seconds you'll have CPU usage, memory pressure, disk I/O, network bandwidth, and filesystem stats flowing into Prometheus.

Step 3: Connecting Prometheus to Grafana

Once Grafana is running (default port 3000, default credentials admin/admin), navigate to Connections > Add new data source > Prometheus. Set the URL to your Prometheus server (e.g., http://prometheus:9090 for Docker Compose setups or http://prometheus-operated:9090 for Kubernetes). Click Save & Test — if the connection is successful, you're ready to build dashboards.

Grafana 10+ (released in 2023 and the current major version in 2025) introduced a redesigned UI with the new Explore Metrics view, allowing you to browse your metrics visually without writing PromQL queries. This is a game-changer for teams new to Prometheus who want instant dashboards without the learning curve of query language.

Essential PromQL Queries Every DevOps Engineer Should Know

PromQL (Prometheus Query Language) is what makes Prometheus powerful. Here are the most useful queries for your dashboards. For CPU usage: use rate(node_cpu_seconds_total{mode!='idle'}[5m]) to get per-core CPU utilization. For memory: use (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 for memory usage percentage.

For HTTP error rate monitoring: rate(http_requests_total{status=~'5..'}[5m]) / rate(http_requests_total[5m]) gives you the error rate percentage. For Kubernetes pod restarts: increase(kube_pod_container_status_restarts_total[1h]) alerts you to crashing containers. For disk usage: (node_filesystem_size_bytes - node_filesystem_free_bytes) / node_filesystem_size_bytes * 100 gives disk fill percentage.

The golden signals of SRE (Latency, Traffic, Errors, Saturation) should be the foundation of every dashboard. Use histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) for P99 latency — this tells you how your slowest 1% of requests are performing, which is critical for SLO tracking.

Top 5 Grafana Dashboard Templates for Prometheus

Grafana's community dashboard library (grafana.com/grafana/dashboards) has thousands of pre-built templates. Here are the must-have dashboards for 2025. Dashboard ID 1860 (Node Exporter Full) is the gold standard for Linux server monitoring — it includes over 30 panels covering CPU, memory, disk, network, and system load with beautiful gauge and time-series visualizations.

Dashboard ID 315 (Kubernetes cluster monitoring) provides a high-level view of cluster health including node status, pod counts, CPU and memory requests vs limits, and network I/O. Dashboard ID 7249 (Kubernetes Pods) lets you drill into individual pod metrics with variable selectors for namespace and pod name. Dashboard ID 9628 (PostgreSQL Database) covers query performance, connections, and replication lag.

To import any dashboard, go to Dashboards > New > Import in Grafana, enter the dashboard ID, select your Prometheus data source, and click Import. You'll have a production-ready monitoring dashboard in under 60 seconds.

Step 4: Configuring Alerting Rules and Alertmanager

Alerting is where monitoring moves from reactive to proactive. In Prometheus, you define alerting rules in YAML files. A typical high CPU alert fires when CPU usage exceeds 80% for more than 5 minutes. You set a severity label (warning or critical), an annotations block with a summary and description, and Alertmanager routes the alert to the right receiver — Slack, PagerDuty, email, OpsGenie, or webhook.

Alertmanager configuration involves three key concepts: routes (which alerts go where), receivers (Slack channels, PagerDuty integrations, email groups), and inhibition rules (silencing lower-severity alerts when a critical alert fires). A well-designed alert routing tree ensures your on-call engineers get the right context at the right time without alert fatigue.

In Grafana 9+ and Grafana 10, you can also manage alerts directly in the Grafana UI via Grafana Unified Alerting — which supports multiple data sources, not just Prometheus. This is the recommended approach for teams who want a single pane of glass for both dashboards and alerts.

Monitoring Kubernetes With Grafana and Prometheus in 2025

The kube-prometheus-stack Helm chart is the easiest way to get full Kubernetes observability. It automatically discovers your cluster's nodes, pods, deployments, and services, creates ServiceMonitor resources that Prometheus Operator uses to configure scraping, and provisions pre-built Grafana dashboards for every Kubernetes component including the API server, etcd, scheduler, and controller manager.

For application-level monitoring, instrument your services with a Prometheus client library (available for Go, Python, Java, Node.js, and more). Expose a /metrics endpoint from your app, create a ServiceMonitor resource, and within minutes your custom business metrics — request counts, processing times, queue depths — flow into Prometheus and become queryable in Grafana.

Grafana Dashboard Best Practices for 2025

Great dashboards tell a story. Start every dashboard with a row of high-level health indicators (stat panels showing green/yellow/red) before drilling into time-series charts. Use template variables to make dashboards reusable across environments, clusters, and namespaces. Always set appropriate time ranges and refresh intervals — a dashboard refreshing every 5 seconds when your scrape interval is 15 seconds wastes resources.

Color your panels consistently: green for healthy, yellow for warning, red for critical. Use Grafana's threshold feature to automatically colorize stat panels based on values. Set alert annotations on time-series graphs so you can visually see when alerts fired and correlate them with metric spikes. Store your dashboard JSON files in Git alongside your infrastructure-as-code so dashboards are version-controlled and deployable.

Grafana + Prometheus vs Datadog vs New Relic in 2025

The open-source Grafana + Prometheus stack competes directly with paid observability platforms like Datadog, New Relic, and Dynatrace. The key advantage of the open-source stack is cost — at scale, Datadog can cost tens of thousands of dollars per month, while a self-hosted Prometheus and Grafana setup costs only the underlying compute and storage.

Datadog wins on ease of setup, automatic service discovery, and APM/tracing integration. Prometheus + Grafana wins on flexibility, cost, and control. For startups and scale-ups with strong DevOps teams, the open-source stack almost always makes more economic sense. Grafana Cloud also offers a managed hosted option with a generous free tier if you want the best of both worlds.

Conclusion: Start Monitoring Smarter Today

Grafana and Prometheus form the backbone of modern cloud-native observability. With the right setup, you'll catch issues before they become outages, understand your infrastructure's behavior under load, and meet your SLO targets with confidence. Whether you're running on bare metal, Docker, Kubernetes, or a hybrid cloud, this stack will give you the visibility you need.

Start with Node Exporter and the kube-prometheus-stack Helm chart, import the top community dashboards, define your golden signal alerts, and build from there. Observability is a journey, not a destination — but with Grafana and Prometheus, you're starting with the best tools the open-source ecosystem has to offer.

Subscribe to our newsletter

Comments


bottom of page