Regular expression to extract cookie value from Apache access logs

First published on January 21, 2016

I was recently troubleshooting a problem where I needed to extract cookie values and IP addresses from Apache access logs. In short, cookies were being shared across sessions instead of being unique to each session. The Apache log entries looked something like this: - [29/Oct/2015:23:59:46 -0400] "GET /user/profile HTTP/1.1" 503 17839 "" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36" "SESSID=59n42qa556o2k08ekmbmlhgdg1; othercookie=(direct)" -

Using this command, I could extract the cookie values for SESSID and save them to a file:

grep -ir 'GET \/user\/profile HTTP\/1.1" 503' /web/logs/access_log | sed -r 's/.*SESSID\=(.*)[;|"].*/\1/' > 503_cookies.txt

The sed regular expression wasn’t stopping the match at the semi-colon or quote, however. Instead of using (.*) in the capture for any character, I had to use [^;"] for “not semi-colon or quote” even though the match on the same characters happens outside of the parentheses:

grep -ir 'GET \/user\/profile HTTP\/1.1" 503' /web/logs/access_log | sed -r 's/.*SESSID\=([^;"]*)[;|"].*/\1/' > 503_cookies.txt

Further work was needed to grab the IP addresses for the matches and save them to another file. Here I didn’t need a regular expression, as I could just grab the first column with awk:

grep -ir 'GET \/user\/profile HTTP\/1.1" 503' /web/logs/access_log | awk '{print $1}' > 503_ips.txt

Then I could use the paste command to put the relevant IP address + cookie value entries on the same lines in the report and collapse all duplicate entries:

paste 503_ips.txt 503_cookies.txt | sort | uniq

Each line then looked something like this: 59n42qa556o2k08ekmbmlhgdg1


Speak your mind

To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
Click to hear an audio file of the anti-spam word