用shell分析nginx日志百度网页蜘蛛列表页来访情况
#!/bin/bash #desc: this scripts for baidunews-spider #date:2014.02.25 #testd in CentOS 5.9 x86_64 #saved in /usr/local/bin/baidu-web.sh #written by [email protected] www.zjyxh.com dt=`date -d "yesterday" +%m%d` if [ $1x != x ] ;then if [ -e $1 ] ;then grep -i "Baiduspider/2.0" $1 > baiduspider-${dt}.txt num=`cat baiduspider-${dt}.txt|wc -l` echo "baiduspider number is ${num},file is baidu-${dt}.txt" cat baiduspider-${dt}.txt|awk ‘{print $7}‘|sort |uniq -c|sort -r >`ls ${1}|cut -c 1-10`-${dt}.txt echo "$1 was done" else echo "$1 not exsist!" fi else echo "usage: $0 file_path" fi
本次用shell分析百度网页蜘蛛跟百度新闻蜘蛛一个方法,无非就是把关键词由baiduspider-news换为baiduspider/2.0。
本文出自 “崔晓辉的博客” 博客,请务必保留此出处http://coralzd.blog.51cto.com/90341/1590956
郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。