Skip to content
samuel edited this page Sep 14, 2010 · 3 revisions

Quick Preliminary Benchmarks

206M access log

Squawk

time squawk "SELECT remote_addr, COUNT(1) as n FROM access.log GROUP BY remote_addr ORDER BY n DESC LIMIT 10"
...
real	0m20.360s
user	0m16.073s
sys	0m0.272s

Unix tool chain

time cut -f 1 -d' ' access.log | sort | uniq -c | sort -rn | head -n 10
...
real	0m8.603s
user	0m6.196s
sys	0m0.484s

Simple python script

import sys
counts = {}
with open(sys.argv[1], "r") as fp:
    for line in fp:
        ip = line.split(' ', 1)[0]
        if ip in counts:
            counts[ip] += 1
        else:
            counts[ip] = 1
counts = sorted(counts.iteritems(), key=lambda c:c[1], reverse=True)
for ip, count in counts[:10]:
    print ip, count
...
real	0m2.259s
user	0m1.632s
sys	0m0.256s

Clone this wiki locally