• Posted by Intent Media 04 May
  • 2 Comments

Making Information Manageable with Fluentd

Why are metrics and log consumption treated as separate things? By creating and maintaining an artificial separation between log data and metrics, we add complexity and fragility to the information gathering stack. In this post, I would like to write about some of the difficulties that this line of thought causes. Then I will go on to demonstrate a specific example of how you might manage information regardless of the source. My hope is that by the conclusion of this article you will have a different perspective on information management and an example implementation that you can use to evaluate your own practices.

The difficulty in segregation of information sources is that we have created processes and configuration around working with these data sources as separate things. Logs and metrics are often collected using different processes which are configured in different locations using different syntaxes. We then configure each tool to interface with third party vendors often in such a way that the overhead of changing metrics/log aggregation vendors becomes a daunting task for any large infrastructure.

Solutions

Let’s imagine an ideal implementation that doesn’t have these constraints. Configuration for both metrics and log consumption would be in the same location. Those configurations would be written in a single, easy to understand configuration language. Data sources would be decoupled from their display and aggregation services to mitigate vendor lock-in.

Prototype

I wanted to see if a system like this could be created using existing tools. After some research I chose Fluentd for the implementation of my prototype. Fluentd allows us to take a three step approach to information collection.

  1. We define a source that our information comes from.
  2. We perform some transformations on the source to derive some specific information from it.
  3. We route that information to other services for display and aggregation.

Let’s consider a simple example: a web server host running nginx. By default, nginx records all access attempts in /var/log/nginx/access.log. We’ll use the following configuration to source the access.log file from that directory.

<source>
  type tail
  format nginx
  pos_file /var/log/td-agent/nginx_access.log.pos
  tag webserver.access
  path /var/log/nginx/access.log
</source>

Then we’ll duplicate the message..

<match webserver.access>
  type copy

  # Emit to log consumer
  <store>
    type record_reformer
    tag log.${tag}
  </store>

  # Reform the tags and reemit for counter
  <store>
    type record_reformer
    tag webserver.${code}
  </store>
</match>

and send one copy of the raw log files to papertrail.

<match log.**>
  type remote_syslog
  host your.host.papertrail.com
  port your_papertrail_port
  output_include_time no
  hostname ${hostname}
  tag webserver
</match>

But from this log we can also derive a few interesting metrics. We can learn how many 200, 404, 502 and other three-number response codes are being returned. To do this we will count the other copy of the message.

<match webserver.**>
  type grepcounter
  count_interval 3
  threshold 1
  input_key code
  regexp ^\d\d\d$
  add_tag_prefix counted
</match>

Then we will reformat the resulting counts into a message type that our metrics aggregator will understand and ship it to librato.

<match counted.webserver.**>
  type record_reformer
  tag metric.webserver.${tag_suffix[-1]}
  renew_record true

  <record>
    key webserver.${tag_parts[-1]}
    value ${count}
    source ${hostname}
  </record>
</match>

<match metric.**>
  type librato
  email your_librato_email@example.com
  apikey your_librato_api_key
  type_key metric_type
</match>

You can find this example stack in our GitHub repo.

Conclusion

This is just one simple example of how we can simplify our information gathering system by breaking down the separation between log data and metrics. In my next post, I’d like to show you how we can use a unified information gathering system like the one we just described to drive highly detailed alerts that also live alongside the code.


Will Weaver is a DevOps Engineer at Intent Media, Inc, where he works managing the technical infrastructure and promoting devops practices throughout the organization. He enjoys experimenting with new deployment methodologies, learning different programming paradigms, and board game nights with friends and family. You can find him on Twitter and on Github.

Post Comments 2

Posted by Kiyoto Tamura on
  • May 6 2015
 
one of the Fluentd maintainers here. Glad to see it being adopted at Intent Media. I see that you guys use Papertrail for log search. Do you send your logs into any other SaaS?
    Posted by Will Weaver on
    • May 7 2015
     
    Thanks for commenting! Right now we're piloting Papertrail for one of our applications. One of the things that I really liked about this type of solution is that to pilot any new target we should be able to swap the Papertrail block with any other service and give it a shot. If we wanted to we could even run them both in parallel so we could have direct comparisons.