Fluentd or Logstash - Which log collector to use in 2024?
Choosing the right log collector is essential in 2024, and in this article, we explore the fundamental distinctions between Fluentd and Logstash.
Choosing the right log collector is essential in 2024, and in this article, we explore the fundamental distinctions between Fluentd and Logstash. We'll look at performance, the plugin ecosystem and routing approaches to help you make an informed decision. Make the right decision for your needs: Fluentd or Logstash log collector?
Introduction
When we have large-scale distributed systems, logging becomes essential for observability, monitoring and security. Regardless of the architecture of our systems, whether monolith or microservices, there is complexity due to the number of moving parts they have and the challenges they face in terms of management, deployment and scaling.
In this context, log management tools help DevOps and SRE teams monitor and improve performance, prevent and correct errors and visualize events.
The tools
In simple terms, a log analysis tool análise de logs has the following main components:
- Log exporter (Configure logs per host) (agents)
- Log collector (concentrators/processors)
- Log storage (log backend)
- Log visualization (dashboards)
In this article, we will discuss two very famous open source log collectors, Logstash and FluentdThe secret of any log analysis tool is the log collectors. Both operate on servers, obtain server metrics, analyze all the logs and transfer them to backends such as Elasticsearch or Grafana Loki. It is their routing mechanism that makes log analysis possible.
Under the Apache 2 license, both Logstash and Fluentd operate as open source data collectors. Logstash is popularly known as the "L" part of the ELK stack managed by Elastic itself. Fluentd, on the other hand, is developed and managed by Treasure Data and has also been adopted as a Cloud Native Computing Foundation (CNCF) project.
Comparing Fluentd vs Logstash
We've assembled a table to give you a brief overview of the main differences. Later on, we'll discuss each point in detail.
Tool comparison table
Main differences between Fluentd and Logstash in detail
Plugin ecosystem
Plugins help any tool to enhance its functionality. Fluentd and Logstash have a rich ecosystem for plugins, including various input systems (files and TCP/UDP), filters and output destinations (InfluxDb, Grafana Loki, Elasticsearch, AWS, GCP, etc.).
The main way in which these plugins are managed differs. Logstash manages all the plugins in a single centralized repository - there are 199 plugins in the logstash-pluglin repository in GitHub. repositório logstash-plugins do GitHub. Fluentd, on the other hand, follows a decentralized approach and does not host all plugins in a single repository. Its official repository hosts only 10 plugins and an index for all the other repositories. But it has support for more plugins than Logstash: around 500 plugins.
You can check out the Logstash repository on Github here,which has all the Logstash plugins. Here is the link de to Fluentd's plugin index..
Memory usage/performance
Although performance is subjective, depending on the user's use case, Logstash consumes 120 MB more memory than Fluentd, which uses 40 MB. Furthermore, when considering modern equipment, this discrepancy may seem insignificant. However, when implemented on thousands of infrastructure devices, this difference of 80*1000 represents a remarkable 80 GB of additional memory usage.
So, in Logstash, you can avoid this problem by running ElasticBeats instead of running Logstash on a single leaf node. ElasticBeats is a resource-efficient, purpose-built log collector where each Beat focuses on just one data source and does it well. Fluentd uses Fluent Bita low-impact embeddable version of Fluentd written in C.
If you're running small applications, we recommend using Fluent-bit, a project also run by the Cloud Native Computing Foundation (CNCF). On the other hand, Elastic beats is a lightweight version of Logstash. For smaller workloads, it's preferable to use Elasticbeats. But if your use case involves more data processing in addition to data transportation, you'll need to use both Logstash and Elastic beats.
Data transfer
Logstash did not have a built-in buffer system for data transfer. In more recent versions, the system has been improved with the addition of persistent queues (PQ) and Dead Letter Queue (DLQ), which are disabled by default. We recommend incorporating external queue brokers, such as Redis or Kafka, into the pipeline to ensure persistence between reboots. Redis acts as a "broker", queuing Logstash events from remote Logstash senders.
Fluentd has a built-in configurable buffer system and does not depend on external queues for persistence. However, the configuration is quite complex. Because of this, it is safer to use Fluentd than Logstash for data transport. It is also recommended to include queue brokers such as Apache Kafka, RabbitMQ or ZeroMQ with persistent queues to ensure reliability.
Event routing
Event routing means sending data and messages between applications and systems. Its function is crucial when evaluating a logging system and managing event routing.
Fluentd uses a markup approach, while Logstash uses if-then-else statements for event routing. This way, we can define certain criteria with If/Then/Else statements - to perform actions on our data. The tagging approach seems a little easier to use than conditional statements. With Fluentd you have to tag each of your data sources (inputs). Fluentd uses tags to compare inputs with different outputs and then forwards the events to the corresponding output.
Log interpretation
The components for log analysis differ from tool to tool. Fluentd uses standard built-in parsers (JSON, regex, CSV etc.), and Logstash uses plugins for this. This makes Fluentd more favorable compared to Logstash, as we don't have to deal with any external plugin for this feature. There is also the possibility of creating your own plugins, both in Logstash and Fluentd, both in Ruby. Vericode has created a FIX protocol log parsing plugin for Logstash and another for Fluentd.
Docker support
Docker provides a built-in fluentd log driver. The log driver sends the container logs to the fluentd collector as structured log data. In the case of Logstash, an extra agent (filebeat) is required in the container to send the logs to Logstash. So if you're running your applications with Docker, Fluentd is a more natural choice. Logs can be sent directly to the Fluentd service from STDOUT. Fluentd makes Docker's overall logging architecture less complex and less risky..
When do you prefer Fluentd to Logstash or vice versa?
For Kubernetes environments or teams working with Docker, Fluentd is the ideal candidate for a log collector. Fluentd has a built-in Docker driver and log analyzer. You don't need an extra agent in the container to send the logs to Fluentd. Because of this feature, Fluentd makes the architecture less complex and less risky for logging errors. Also, if memory is your weak point, opt for Fluentd, as it is more memory-efficient due to the lack of JVM and Java runtime dependencies.
Fluentd's processing capacity is greater than Logstash's, so if you want to save dollars on your cloud budget: go for Fluentd.
If you have multiple data sources and want to use one to visualize in a "Single Pane of Glass", with a Grafana visualizer or other, Fluentd is the best option.
On the other hand, Logstash works well with Elasticsearch and Kibana. So if you already have Elasticsearch and Kibana in your infrastructure, Logstash is your best bet for a log collector.
Vericode's choice for Log Analytics - Prefer Fluentd
When comparing the two tools, Fluentd proved to be more efficient and consumed fewer resources than Logstash since it doesn't need the JVM (java). In addition, Fluentd has a much larger number of plugins and they are not centralized, unlike Logstash where all the plugins are in a single git repository.
Another feature of Fluentd is its better routing approach, as it marks events using tags, which is easier than using if-else conditions.
Therefore, if there is a need to collect a high volume of logs, Fluentd is a much more efficient solution, reducing the need for computing resources to achieve the same objectives compared to Logstash and providing an output for any Log Platform with its Plugins.
And your budget will also benefit because you'll need fewer computing resources, so you'll spend less on the cloud.
Authors
Article based on “Fluentd vs Logstash – Choosing a Log collector for Log Analytics” September 2023.