Log and metric collection/aggregation via Grafana Alloy

By Daniel Wietrzychowski

Introduction

Understanding what every part of your IT infrastructure is doing is crucial for keeping it running at peak performance. On my journey to set up monitoring for VMs, clusters, containers, services, and applications, I encountered several hurdles—chief among them was selecting the right tools for log and metric collection and aggregation. Initially, I opted for a Prometheus/Grafana stack paired with Promtail, only to discover that Promtail had been deprecated in favor of Grafana Alloy. Grafana Alloy immediately caught my attention as it seemed to combine all the tools I had planned to use into one powerful, unified solution.

As I delved into Grafana Alloy, it quickly stood out as the ideal tool to streamline the often complex process of collecting, aggregating, and shipping logs, metrics, and traces. With IT infrastructures growing increasingly intricate—thanks to the surge in cloud-native applications, containerization, and hybrid-cloud environments—the demand for robust telemetry and analytics has never been higher. Grafana Alloy seemed perfectly poised to meet this challenge, offering the insights needed to optimize these sophisticated systems for peak performance.

This is where Grafana Alloy truly shines—a ‘batteries-included’ solution crafted to simplify observability. By offering a unified platform for managing telemetry and analytics, it eliminates much of the complexity involved in monitoring modern IT infrastructures. Whether you’re dealing with VMs, containers, or hybrid-cloud environments, Grafana Alloy can be configured in countless ways, making it a versatile and reliable foundation for achieving comprehensive IT infrastructure observability.

When we decided to use Grafana Alloy to collect and aggregate logs and metrics for our infrastructure and applications, the first question was, “Where do we even begin?” The online documentation provides instructions for installing and integrating Alloy almost anywhere, which is great—but getting started still felt daunting. While the basics were straightforward—installing Alloy and setting up its configuration file—getting the configuration just right proved to be a challenge.

The documentation, while helpful, only scratches the surface of Alloy’s capabilities and settings. As a relatively young project (at the time of writing), there weren’t many external resources or community solutions to fall back on. This meant a lot of trial and error, with multiple iterations of the configuration file before achieving the desired results.

Adding to the complexity, Alloy doesn’t natively support splitting the configuration into multiple files. Initially, this caused issues with configuration management tools overwriting existing settings instead of selectively updating them. And to top it off, when Alloy was set up as a service, a bad configuration file change would fail silently, making debugging even more frustrating.

Another challenge we encountered was the inability to split the configuration into multiple files. This limitation initially caused headaches with configuration management tools, as they would overwrite the entire configuration instead of selectively updating specific parts. It made managing complex setups unnecessarily cumbersome.

To make matters worse, when Grafana Alloy was set up as a service, any errors in the configuration file would fail silently. This meant that a single misstep could leave the service running with an invalid configuration, making debugging a frustrating and time-consuming process.

Solution

Getting basic metric and log collection up and running on VMs was surprisingly smooth, thanks to the well-documented installation flow and the helpful examples provided. The process felt intuitive and straightforward, making it easy to get started.

By default, Grafana Alloy on Linux uses the /etc/alloy/config.alloy file to store its configuration. To make managing configurations more flexible and organized, we opted to split the configuration into multiple manageable parts using *.config files. These files, written in a format compatible with Alloy, were stored in /etc/alloy/. For example, we separated configurations into files like /etc/alloy/core.config and /etc/alloy/application.config.

To combine these files into a single configuration file that Alloy could use, we employed a simple one-liner script:
cat /etc/alloy/*.config > /etc/alloy/config.alloy.

This approach allowed us to easily generate the final configuration file in the default location. Once the file was updated, restarting the Alloy service ensured it utilized the latest configuration seamlessly. This method not only streamlined the process but also made managing complex setups far more efficient.

One way to address issues with silent failures in configuration files is by using the alloy fmt [flags] [path config-file] command. This handy tool can auto-format the configuration file and will fail if the file contains any syntax errors. However, keep in mind that while it ensures the configuration is syntactically correct, it doesn’t validate whether the Alloy components are configured properly.

For example, here’s a configuration snippet that collects log files from the /tmp/logs/ directory and forwards them to a Loki endpoint. This setup uses a simple file match with an asterisk-glob expansion, but it could easily be adapted to use a regex if needed:

config.alloy
local.file_match "tmp" {
path_targets = [{"__path__" = "/tmp/logs/**/*.log"}]
}

loki.source.file "files" {
targets = local.file_match.tmp.targets
forward_to = [loki.write.endpoint.receiver]
}

loki.write "endpoint" {
endpoint {
url = LOKI_URL
basic_auth {
username = USERNAME
password = PASSWORD
  }
 }
}

Another example would be to use the prometheus.exporter.unix to provide a similar functionality like the prometheus node-exporter running on linux, albeit via push instead of pull.

config.alloy
prometheus.exporter.unix "demo" { }

// Configure a prometheus.scrape component to collect unix metrics.
prometheus.scrape "demo" {
targets = prometheus.exporter.unix.demo.targets
forward_to = [prometheus.remote_write.demo.receiver]
}

prometheus.remote_write "demo" {
endpoint {
url = PROMETHEUS_REMOTE_WRITE_URL

basic_auth {
username = USERNAME
password = PASSWORD; 
  }
 }
}

Grafana Alloy offers a wide range of components, all documented on the Grafana Alloy reference pages, allowing you to tailor your configuration to meet your specific monitoring needs. However, be prepared for a bit of trial and error—especially for more complex scenarios—as the documentation can sometimes feel a bit sparse when tackling advanced use cases.

Conclusion

In conclusion, setting up Grafana Alloy to monitor our IT infrastructure wasn’t without its hurdles. From configuration challenges to limited documentation, the process required persistence and creativity. But the payoff was well worth it. Grafana Alloy brings everything together into a single, unified platform for collecting, aggregating, and shipping logs, metrics, and traces. It simplifies the observability process and makes it far more accessible.

As IT infrastructures grow increasingly complex, the need for robust observability tools has never been greater. Grafana Alloy rises to the challenge, providing a comprehensive, flexible, and scalable solution that ensures our monitoring setup is ready to meet the demands of modern computing. With Grafana Alloy, we’ve gained the insights we need to keep our systems running smoothly and efficiently, no matter how intricate they become.