对指定URL获取其子链接
仿照http://blog.csdn.net/lming_08/article/details/44710779里面的方法, 获取指定URL 的所需的子链接及其描述.
#!/usr/bin/python # -*- coding: utf-8 -*- import sys import urllib2 import re if len(sys.argv) != 2: print "%s url" % __file__ sys.exit(-1) url=sys.argv[1] user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6' headers = { 'User-Agent' : user_agent } ''' <a href="http://faxian.smzdm.com/p/488573" target="_blank" onclick="ga('send', 'event','发现频道','列表_文章图片','488573_HITACHI 日立 CM-N1000 冷冻收缩毛孔多功能美容仪');" class="picBox"> <img src="http://ym.zdmimg.com/201503/29/5517b316c0c752738.jpg_d200.jpg" alt="HITACHI 日立 CM-N1000 冷冻收缩毛孔多功能美容仪" title="" height= ''' req = urllib2.Request(url, headers = headers) try: html = urllib2.urlopen(req).read() pattern = re.compile(r"<a href=.* target=\"_blank\" onclick=.*\s?.*<img src=.*\.jpg\" alt=.*title=\"\".*height=") # correct res_list = pattern.findall(html) for content in res_list: pat = re.compile(r"http://.*p/\d{6}") url = pat.search(content).group() pat = re.compile(r"alt=\".*\" title") desc = pat.search(content).group()[5:-8] print url, re.sub(r"\s?", "", desc) except urllib2.HTTPError: print "failed parsing web url"
执行结果为:
lming_08@ubuntu:~/MyWorkSpace/Pycode/htmlparse$ python get_smzdm_productinfo.py http://faxian.smzdm.com/fenlei/nvshixiangshui http://faxian.smzdm.com/p/487641 TOMMYHILFIGER都市新贵女士EDT淡香水30m http://faxian.smzdm.com/p/487231 GUERLAIN娇兰AquaAllegoria花草水语系列橙花伊甸园女士淡香 http://faxian.smzdm.com/p/482913 山东福利:LANCOME兰蔻珍爱爱恋女士香水30m http://faxian.smzdm.com/p/479941 SalvatoreFerragamo菲拉格慕仲夏之梦淡香水喷雾100ml/3.4o http://faxian.smzdm.com/p/478681 VIVIENNEWESTWOODBoudoir密室女士香水(50ml http://faxian.smzdm.com/p/478055 SwissArmyMountainWater香 http://faxian.smzdm.com/p/475269 BURBERRY博柏利周末香水DEP50m http://faxian.smzdm.com/p/473353 MOSCHINO雾仙浓奥莉芙娃娃淡香水4.9m http://faxian.smzdm.com/p/472327 GALIMARD加利马尔蓝色妖姬绽放夏日限量版30m http://faxian.smzdm.com/p/471217 Dior迪奥真我淡香水50m http://faxian.smzdm.com/p/470015 BVLGARI宝格丽淡香水喷雾100m http://faxian.smzdm.com/p/469435 ANNASUI安娜苏幻境绮缘女士持久淡香水50m http://faxian.smzdm.com/p/468123 CalvinKlein卡文克莱因为你女用淡香水100ml(简装 http://faxian.smzdm.com/p/467927 BURBERRY博柏利body肌体香水喷雾35M http://faxian.smzdm.com/p/467535 SalvatoreFerragamo菲拉格慕闪耀光采淡香水喷雾100m http://faxian.smzdm.com/p/467391 SalvatoreFerragamo菲拉格慕花水时刻淡香水喷雾100m http://faxian.smzdm.com/p/464821 BURBERRY博柏利周末香水喷雾50m http://faxian.smzdm.com/p/462473 Annasui安娜苏摇滚心情淡香水喷雾50m http://faxian.smzdm.com/p/461755 LANVIN浪凡我愿意女士香水4.5m http://faxian.smzdm.com/p/461189 Lanvin浪凡光韵女士香水5m
lming_08@ubuntu:~/MyWorkSpace/Pycode/htmlparse$ python get_smzdm_productinfo.py http://faxian.smzdm.com/fenlei/gehuhuazhuang/ http://faxian.smzdm.com/p/488705 海淘券码:BEAUTYEXPERTCaudalie欧缇丽品牌大 http://faxian.smzdm.com/p/488573 HITACHI日立CM-N1000冷冻收缩毛孔多功能美容 http://faxian.smzdm.com/p/488509 Lifebuoy卫宝先进除菌香皂115g*5块(3块十效多护+2块乳润呵护 http://faxian.smzdm.com/p/488505 Bioderma贝德玛净妍洁肤液500ml* http://faxian.smzdm.com/p/488487 飞利浦(PHILIPS)HX9362/67奢宠礼遇粉钻牙刷声波震动牙 http://faxian.smzdm.com/p/488453 沙宣修护水养洗发露750m http://faxian.smzdm.com/p/488451 TJOY/丁家宜化妆品套装正品美白保湿补水四件套装女士美容护 http://faxian.smzdm.com/p/488449 丸美巧克力青春丝滑眼乳霜超值套装(眼乳霜25g+面膜3片+小红瓶臻皙粉透焕采露20ml http://faxian.smzdm.com/p/488403 ZHONGHUA中华魔丽迅白冰极薄荷味牙膏170g+卓效倍护炫闪皓白牙膏40 http://faxian.smzdm.com/p/488385 移动端:Panasonic松下ES-ERT3-S405电动剃须 http://faxian.smzdm.com/p/488387 lion狮王渍脱超亮白牙膏150g*4支(特卖 http://faxian.smzdm.com/p/488379 优惠券:乐蜂网新用户注 http://faxian.smzdm.com/p/488371 丝塔芙Cetaphil经典温和特惠组合装(洁面乳237ml+保湿润肤乳237ml+29ml洁面乳保湿乳套装 http://faxian.smzdm.com/p/488365 亚马逊中国:PEHCHAOLIN百雀羚品牌促 http://faxian.smzdm.com/p/488363 当当优品甜梦如风保湿修复护手霜礼盒50g*3 http://faxian.smzdm.com/p/488351 Syoss丝蕴水润洗护4周年套装(500洗+500润送90发膜+45营养水)+臻粹莹润洗发水100ml*2*2 http://faxian.smzdm.com/p/488349 VS沙宣垂坠质感750ml润发乳限量版*2 http://faxian.smzdm.com/p/488313 法拉利红色力量淡香水喷雾125m http://faxian.smzdm.com/p/488299 促销活动:CLARINS美国官网任意订 http://faxian.smzdm.com/p/488255 Schwarzkopf施华蔻多效修护19活幻滋养润发液150ml*2
郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。