3g网页版_提升学历的广告词_小程序开发哪家好兴田德润官网多少_中国建筑集团2024招聘官网

建筑承包工程公司

佛山传媒集团官网、通许网页设计、永久使用、不限域名、商标注册申请官网2022

阳江网红桥位置?

3g网页版_提升学历的广告词_小程序开发哪家好兴田德润官网多少_中国建筑集团2024招聘官网

proxies = []def write_to_mongo(ips, city):'''将数据写入mongoDB'''client = Client(host='localhost', port=27017)db = client['fs_db']coll = db[city + '_good']for ip in ips:coll.insert_one({'name': ip[0], \'price': ip[1],'addresses': ip[2],'areas': ip[3],'eq': ip[4]})client.close()def read_from_mongo(city):client = Client(host='localhost', port=27017)db = client['fs_db']coll = db[city + '_good']li = coll.find()client.close()return liclass Consumer(threading.Thread):def __init__(self, args):threading.Thread.__init__(self, args=args)def run(self):global is_crawurl_demo, i, city_id, lock = self._argsprint("{}, 第{}页".format(city[city_id], i))url = url_demo.format(i)soup = get_real(url)names = []for name in soup.select('.tit_shop'):names.append(name.text.strip())addresses = []for item in soup.find_all('p', attrs={'class': 'add_shop'}):address = item.a.text + " " + item.span.textaddresses.append(address.replace('\t', '').replace('\n', ''))es = []for item in soup.find_all('p', attrs={'class': 'tel_shop'}):es.append(item.text.replace('\t', '').replace('\n', ''))moneys = []for money in soup.find_all("span", attrs={"class": 'red'}):moneys.append(money.text.strip())areas = []for area in soup.find_all('dd', attrs={'class': 'price_right'}):areas.append(area.find_all('span')[-1].text)houses = []for idx in range(len(names)):try:item = [names[idx], moneys[idx], addresses[idx], areas[idx], es[idx]]print(item)houses.append(item)except Exception as e:print(e)lock.acquire()write_to_mongo(houses, e_city[city_id])lock.release()print("线程结束{}".format(i))def dict2proxy(dic):s = dic['type'] + '://' + dic['ip'] + ':' + str(dic['port'])return {'http': s, 'https': s}def get_real(url):resp = requests.get(url, headers=header)soup = BeautifulSoup(resp.content, 'html.parser', from_encoding='gb18030')if soup.find('title').text.strip() == '跳转...':pattern1 = re.compile(r"var t4='(.*?)';")script = soup.find("script", text=pattern1)t4 = pattern1.search(str(script)).group(1)pattern1 = re.compile(r"var t3='(.*?)';")script = soup.find("script", text=pattern1)t3 = re.findall(pattern1, str(script))[-2]url = t4 + '?' + t3HTML = requests.get(url, headers=header)soup = BeautifulSoup(HTML.content, 'html.parser', from_encoding='gb18030')elif soup.find('title').text.strip() == '访问验证-房天下':passreturn soupdef read_proxies():client = Client(host='localhost', port=27017)db = client['proxies_db']coll = db['proxies']# 先检测,再写入,防止重复dic = list(coll.find())client.close()return dicdef craw():lock = threading.Lock()for idx in trange(len(e_city)):url = eshouse[idx]soup = get_real(url.format(2))try:page_number = int(soup.find('div', attrs={'class': 'page_al'}).find_all('span')[-1].text[1:-1])pages = list(range(1, page_number + 1))except:pages = list(range(1, 101))url_demo = urlts = []# pages = [1, 2, 3]while len(pages) != 0:for i in range(10):t = Consumer((url_demo, pages.pop(), idx, lock))t.start()ts.append(t)if len(pages) == 0:breakfor t in ts:t.join()ts.remove(t)if __name__ == '__main__':craw() 免费软件盘下载广州最新疫情防控政策通告软件开发项目方案怎么写同城引流有什么方法邢台疫情防控政策北京互联网国企有哪些设计师培训班招生简章个人建小说网站合法吗微信小程序招商网络推广的岗位职责自己需要怎么提升自己的能力一人之下王也结局投资公司排名一览表app设计说明范文如何制作音乐网站连云港网站设计说明3000字广告创意设计师的工资淘宝站内推广的方式有哪些日本软银背后大股东合肥瑶海区疫情最新消息今天工业设计大赛网站厦门新闻今日最新消息深圳注册公司哪家好河南现在疫情严重吗网站上海wangz网站广告策划方案广西住房和城乡建设网旅游网站网页基本结构世界十大室内设计师鞍钢集团网站免费男女全套ppt模板

猜你喜欢

  • 友情链接:
  • dw制作电影网站 山西seo咨询 长沙seo推广排名 企业形象设计是围绕什么为主体的一系列视觉符号的设计 多商户商城公司哪家好 二级域名和一级域名服务器