PYTHON笔记五:通过PYTHON从PUBMED搜索列表中获取邮件地址

本文介绍通过PYTHON从PUBMED搜索列表中获取邮件地址的方法,可以获取一整页文章列表中的邮件地址,下一篇文章介绍获取多页列表中邮件地址的方法。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
url = "https://pubmed.ncbi.nlm.nih.gov/?term=aav&filter=pubt.review&filter=datesearch.y_10"
import requests
headers = {"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36"}
resp = requests.get(url,headers=headers)
#print(resp.status_code)
resp.encoding = "utf-8"
#print(resp.text)

from bs4 import BeautifulSoup
soup = BeautifulSoup(resp.text,"html.parser")
tags = soup("button")

lstA = list()
for tag in tags:
if tag.get("data-permalink-url") and not tag.get("data-permalink-url") in lstA:
lstA.append(tag.get("data-permalink-url"))
#print(lst)
#print(len(lst))
from bs4 import BeautifulSoup
import re

lstB = list()
for link in lstA:
url = link
resp = requests.get(url,headers=headers)
# print(resp.status_code)
resp.encoding = "utf-8"
#print(resp.text)
soup = BeautifulSoup(resp.text,"html.parser")
tags = soup("li")
for tag in tags:
if tag.get("data-affiliation-id"):
lstB.append(tag.contents[1])
#print(lstB)

maillist = list()
for text in lstB:
if re.search("@",text):
emails = re.findall("\S+@\S+",text)
# print(emails)
if emails in maillist:continue
else:maillist.append(emails)
print(len(maillist),"email address were retrived, done")

运行代码,结果如下:

1
7 email address were retrived, done
  • 本文作者:括囊无誉
  • 本文链接: python-5-get-email/
  • 版权声明: 本博客所有文章均为原创作品,转载请注明出处!
------ 本文结束 ------
坚持原创文章分享,您的支持将鼓励我继续创作!