python爬站长之家写一个信息搜集器-白红宇

强烈建议你试试无所不能的chatGPT，快点击我

python爬站长之家写一个信息搜集器

阅读量：4673 次

发布时间：2019-06-09

本文共 2674 字，大约阅读时间需要 8 分钟。

前言：

不知道写什么好，绕来绕去还是写回爬虫这一块。

之前的都爬了一遍。这次爬点好用一点的网站。

0x01:

自行备好requests模块

目标站：http://tool.chinaz.com/

0x2:

代码：

import optparseimport requestsimport reimport sysfrom bs4 import BeautifulSoupdef main():    usage="[-z Subdomain mining]" \          "[-p Side of the station inquiries]" \          "[-x http status query]"    parser=optparse.OptionParser(usage)    parser.add_option('-z',dest="Subdomain",help="Subdomain mining")    parser.add_option('-p',dest='Side',help='Side of the station inquiries')    parser.add_option('-x',dest='http',help='http status query')    (options,args)=parser.parse_args()    if options.Subdomain:        subdomain=options.Subdomain        Subdomain(subdomain)    elif options.Side:        side=options.Side        Side(side)    elif options.http:        http=options.http        Http(http)    else:        parser.print_help()        sys.exit()def Subdomain(subdomain):    print('-----------Subdomains quickly tap-----------')    url="http://m.tool.chinaz.com/subdomain/?domain={}".format(subdomain)    header={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}    r=requests.get(url,headers=header).content    g = re.finditer('\D[a-zA-Z0-9][-a-zA-Z0-9]{0,62}\D(\.[a-zA-Z0-9]\D[-a-zA-Z0-9]{0,62})+\.?', str(r))    for x in g:        lik="".join(str(x))        opg=BeautifulSoup(lik,'html.parser')        for link in opg.find_all('td'):            lops=link.get_text()            print(lops)def Side(side):    print('--------Side of the station inquiries--------')    url="http://m.tool.chinaz.com/same/?s={}".format(side)    header={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}    r=requests.get(url,headers=header).content    g=r.decode('utf-8')    ksd=re.finditer('[a-zA-Z0-9][-a-zA-Z0-9]{0,62}(\.[a-zA-Z0-9][-a-zA-Z0-9]{0,62})+\.?',str(g))    for l in ksd:        ops="".join(str(l))        pods=BeautifulSoup(ops,'html.parser')        for xsd in pods.find_all('a'):            sde=re.findall('[a-zA-z]+://[^\s]*',str(xsd))            low="".join(sde)            print(low)def Http(http):    print('--------Http status query--------')    url="http://{}".format(http)    header={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}    r=requests.get(url,headers=header)    b=r.headers    for sdw in b:        print(sdw,':',b[sdw])if __name__ == '__main__':    main()

　　运行截图：

-h 帮助

-z 子域名挖掘

-p 旁站查询

-x http状态查询

-z 截图

-p 截图

-x 截图

距离上学还有5天。啊啊啊啊啊啊啊啊啊啊啊

转载于:https://www.cnblogs.com/haq5201314/p/8455448.html

你可能感兴趣的文章

Python小应用1 - 抓取网页中的链接地址

HTML表格和列表笔记&练习<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>关于表格的一些练...

Hadoop HBase概念学习系列之hbase shell中执行java方法（高手必备）（二十五）

SharePoint 2010中的内容类型集线器 - 内容类型发布与订阅

如何解决在Windows Server 2008 R2 上安装证书服务重启后出现 CertificationAuthority 91错误事件...

c# 获取键盘的输入

mysql忘记密码

小股神助A股股民畅享经济发展红利

Python灰帽子pdf

Node.js区块链开发pdf

轻松学SQL Server数据库pdf

Oracle 日期查询

说说今年的计划

把discuzX 的用户登录信息添加到纯静态页面

文件大小计算

iOS：给图片置灰色

Java 8 (5) Stream 流 - 收集数据

ubuntu下安装JDK

【C#】使用DWM实现无边框窗体阴影或全透窗体

喝酒易醉，品茶养心，人生如梦，品茶悟道，何以解忧？唯有杜康！-- 愿君每日到此一游！

当前时间: 2024-10-22 08:00:56 当前IP: 3.149.26.31 联系邮箱:javaeecc@qq.com Copyright © 2020 - 2022 baihongyu.com 京ICP备2021015314号-2

强烈建议你试试无所不能的CHAT-GPT，快点击我

强烈建议你试试无所不能的CHAT-GPT，快点击我