去哪网的一个面试题:shell统计日志中各IP 出现的次数
去哪网的面试:数据量不大的话用awk最方便,但长时间没有用过了,忘记了awk数组的用法。 在这里复习一下。 假设数据格式为:
178.60.128.31 www.google.com.hk
193.192.250.158 www.google.com
210.242.125.35 adwords.google.com
210.242.125.35 accounts.google.com.hk
210.242.125.35 accounts.google.com
210.242.125.35 accounts.l.google.com
64.233.181.49 www.google.com
212.188.10.167 www.google.com
23.239.5.106 www.google.com
64.233.168.41 www.google.com
62.1.38.89 www.google.com
62.1.38.89 chrome.google.com
193.192.250.172 www.google.com
212.188.10.241 www.google.com
37.228.69.57 www.google.com
222.255.120.42 www.google.com
222.255.120.42 www.gstatic.com
212.188.10.167 www.googleapis.com
64.233.181.49 www.googleapis.com
64.233.181.49 fonts.googleapis.com
193.192.250.158 plus.google.com
193.192.250.158 talkgadget.google.com
193.192.250.158 ssl.gstatic.com
193.192.250.158 images-pos-opensocial.googleusercontent.com
193.192.250.158 images1-focus-opensocial.googleusercontent.com
193.192.250.158 images2-focus-opensocial.googleusercontent.com
193.192.250.158 images3-focus-opensocial.googleusercontent.com
193.192.250.158 images4-focus-opensocial.googleusercontent.com
193.192.250.158 images5-focus-opensocial.googleusercontent.com
193.192.250.158 images6-focus-opensocial.googleusercontent.com
193.192.250.158 clients4.google.com
222.255.120.42 google.com
222.255.120.42 apis.google.com
222.255.120.42 clients1.google.com
193.192.250.158 clients2.google.com
193.192.250.158 clients3.google.com
193.192.250.158 clients5.google.com
64.233.181.49 maps.google.com
64.233.181.49 mts0.google.com
64.233.181.49 maps.gstatic.com
awk的统计代码:awk '{arr[$1]++;}END{for(i in arr){print i , arr[i] }}' test.txt
输出:
[blog@AY1310301904525972ddZ ~]$ awk '{arr[$1]++;}END{for(i in arr){print i , arr[i] }}' test.txt
212.188.10.241 1
64.233.168.41 1
23.239.5.106 1
193.192.250.158 15
178.60.128.31 1
37.228.69.57 1
212.188.10.167 2
193.192.250.172 1
62.1.38.89 2
64.233.181.49 6
210.242.125.35 4
222.255.120.42 5
增加排序:
[blog@AY1310301904525972ddZ ~]$ awk '{arr[$1]++;}END{for(i in arr){print i , arr[i] }}' test.txt | sort -n -k 2
178.60.128.31 1
193.192.250.172 1
212.188.10.241 1
23.239.5.106 1
37.228.69.57 1
64.233.168.41 1
212.188.10.167 2
62.1.38.89 2
210.242.125.35 4
222.255.120.42 5
64.233.181.49 6
193.192.250.158 15
=============对网友:【hattah】 回答的补充===============
测试了两种方法的效率: 理论上sort排序数据量越大,速度越慢。 实测结果:
[blog@AY1310301904525972ddZ ~]$ time awk '{print $1}' test.txt |sort|uniq -c
1380 178.60.128.31
17312 193.192.250.158
1160 193.192.250.172
4640 210.242.125.35
2320 212.188.10.167
1160 212.188.10.241
5734 222.255.120.42
1160 23.239.5.106
1160 37.228.69.57
2320 62.1.38.89
1160 64.233.168.41
6894 64.233.181.49
real 0m0.236s
user 0m0.228s
sys 0m0.004s
对比
[blog@AY1310301904525972ddZ ~]$ time awk '{arr[$1]++;}END{for(i in arr){print i , arr[i] }}' test.txt | sort -n -k 2
193.192.250.172 1160
212.188.10.241 1160
23.239.5.106 1160
37.228.69.57 1160
64.233.168.41 1160
178.60.128.31 1380
212.188.10.167 2320
62.1.38.89 2320
210.242.125.35 4640
222.255.120.42 5734
64.233.181.49 6894
193.192.250.158 17312
real 0m0.025s
user 0m0.022s
sys 0m0.001s