10 Ways to Analyzing your Apache Access Logs using Bash

Since it’s original launch in 1995, the Apache Web Server continues to be one of the most popular web servers in use today. One of my favourite parts of working with Apache during this time has been discovering and analysing the wealth of valuable information contains its access logs (i.e. the files produced on foot of general usage from users viewing the websites served by Apache on a given server).

And while there are plenty of free tools available to help analyse and visualise this information (in an automated or report-driven way), it can often be quicker (and more efficient) to just get down and dirty with the raw logs and fetch the data you need that way instead (subject to traffic volumes and log file sizes, of course). This is also a lot more fun.

For the purposes of this discussion, let’s assume that you are working with compressed backups of the access logs and have a single, compressed file for each day of the month (e.g. the logs were rotated by the operating system). I’m also assuming that you are using the standard “combined” LogFormat.

1. Total Number of Requests

This is just a simple word count (wc) on the relevant file(s):

Requests For Single Day

$ zcat access.log-2014-12-31.gz | wc -l

Requests For Entire Month

$ zcat access.log-2014-12-*.gz | wc -l

2. Requests per Day (or Month)

This will show the number of requests per day for a given month:

$ for d in {01..31}; do echo "$d = `zcat access.log-2014-12-$d.gz | wc -l`"; done

Or the requests per month for a given year:

$ for m in {01..12}; do echo "$m = `zcat access.log-2014-$m-*.gz | wc -l`"; done

3. Traffic Sources

Show the list of unique IP addresses the given requests orginated from:

$ zcat access.log-2014-12-31.gz | cut -d" " -f1 | sort | uniq

List the unique IP addresses along with the number of requests from each, from lowest to highest:

$ zcat access.log-2014-12-31.gz | cut -d" " -f1 | sort | uniq -c | sort -n

4. Traffic Volumes

Show the number of requests per minute over a given day:

$ zcat access.log-2014-12-31.gz | cut -d" " -f4 | cut -d":" -f2,3 | uniq -c

Show the number of requests per second over a given day:

$ zcat access.log-2014-12-31.gz | cut -d" " -f4 | cut -d":" -f2,3,4 | uniq -c

5. Traffic Peaks

Show the peak number of requests per minute for a given day:

$ zcat access.log-2014-12-31.gz | cut -d" " -f4 | cut -d":" -f2,3 | uniq -c | sort -n

Show the peak number of requests per second for a given day:

$ zcat access.log-2014-12-31.gz | cut -d" " -f4 | cut -d":" -f2,3,4 | uniq -c | sort -n

Commands Reference

The selection of Unix commands used to conduct the above analysis (along with what they’re doing above) was:

  • zcat – prints the contents of a compressed file.
  • wc – counts the number of lines in a file (or input stream)
  • cut – splits data based on a delimiting character.
  • sort – sorts data, either alphabetically or numerically (via -n option).
  • uniq – removes duplicates from a list, or shows how much duplication there is.