去哪网的一个面试题:shell统计日志中各IP 出现的次数

去哪网的面试:数据量不大的话用awk最方便,但长时间没有用过了,忘记了awk数组的用法。 在这里复习一下。 假设数据格式为:

178.60.128.31 www.google.com.hk
193.192.250.158 www.google.com
210.242.125.35 adwords.google.com
210.242.125.35 accounts.google.com.hk
210.242.125.35 accounts.google.com
210.242.125.35 accounts.l.google.com
64.233.181.49 www.google.com
212.188.10.167 www.google.com
23.239.5.106 www.google.com
64.233.168.41 www.google.com
62.1.38.89 www.google.com
62.1.38.89 chrome.google.com
193.192.250.172 www.google.com
212.188.10.241 www.google.com
37.228.69.57 www.google.com
222.255.120.42 www.google.com
222.255.120.42 www.gstatic.com
212.188.10.167 www.googleapis.com
64.233.181.49 www.googleapis.com
64.233.181.49 fonts.googleapis.com
193.192.250.158 plus.google.com
193.192.250.158 talkgadget.google.com 
193.192.250.158 ssl.gstatic.com
193.192.250.158 images-pos-opensocial.googleusercontent.com
193.192.250.158 images1-focus-opensocial.googleusercontent.com
193.192.250.158 images2-focus-opensocial.googleusercontent.com
193.192.250.158 images3-focus-opensocial.googleusercontent.com
193.192.250.158 images4-focus-opensocial.googleusercontent.com
193.192.250.158 images5-focus-opensocial.googleusercontent.com
193.192.250.158 images6-focus-opensocial.googleusercontent.com
193.192.250.158 clients4.google.com
222.255.120.42 google.com
222.255.120.42 apis.google.com
222.255.120.42 clients1.google.com
193.192.250.158 clients2.google.com
193.192.250.158 clients3.google.com
193.192.250.158 clients5.google.com
64.233.181.49 maps.google.com
64.233.181.49 mts0.google.com
64.233.181.49 maps.gstatic.com

awk的统计代码:awk '{arr[$1]++;}END{for(i in arr){print i , arr[i] }}' test.txt 输出:

[blog@AY1310301904525972ddZ ~]$ awk '{arr[$1]++;}END{for(i in arr){print i , arr[i] }}' test.txt
212.188.10.241 1
64.233.168.41 1
23.239.5.106 1
193.192.250.158 15
178.60.128.31 1
37.228.69.57 1
212.188.10.167 2
193.192.250.172 1
62.1.38.89 2
64.233.181.49 6
210.242.125.35 4
222.255.120.42 5

增加排序:

[blog@AY1310301904525972ddZ ~]$ awk '{arr[$1]++;}END{for(i in arr){print i , arr[i] }}' test.txt | sort -n -k 2
178.60.128.31 1
193.192.250.172 1
212.188.10.241 1
23.239.5.106 1
37.228.69.57 1
64.233.168.41 1
212.188.10.167 2
62.1.38.89 2
210.242.125.35 4
222.255.120.42 5
64.233.181.49 6
193.192.250.158 15

=============对网友:【hattah】 回答的补充===============

测试了两种方法的效率: 理论上sort排序数据量越大,速度越慢。 实测结果:

[blog@AY1310301904525972ddZ ~]$ time awk '{print $1}' test.txt |sort|uniq -c
   1380 178.60.128.31
  17312 193.192.250.158
   1160 193.192.250.172
   4640 210.242.125.35
   2320 212.188.10.167
   1160 212.188.10.241
   5734 222.255.120.42
   1160 23.239.5.106
   1160 37.228.69.57
   2320 62.1.38.89
   1160 64.233.168.41
   6894 64.233.181.49

real    0m0.236s
user    0m0.228s
sys     0m0.004s

对比

[blog@AY1310301904525972ddZ ~]$  time  awk '{arr[$1]++;}END{for(i in arr){print i , arr[i] }}' test.txt | sort -n -k 2  
193.192.250.172 1160
212.188.10.241 1160
23.239.5.106 1160
37.228.69.57 1160
64.233.168.41 1160
178.60.128.31 1380
212.188.10.167 2320
62.1.38.89 2320
210.242.125.35 4640
222.255.120.42 5734
64.233.181.49 6894
193.192.250.158 17312

real    0m0.025s
user    0m0.022s
sys     0m0.001s
blog comments powered by Disqus