python模拟登录

想使用python requests库直接调用api,却又需要相关的Cookie信息,比如:session ID。此时就需要登录之后拿到Cookie信息,在发起http请求时候携带上Cookie,就可以访问其他api了。

使用selenium库模拟登录获取Cookie

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from selenium import webdriver

# remeber to call driver.close()
driver = webdriver.Chrome(executable_path='/usr/local/bin/hromedriver')

driver.get('www.xxx.com/login.html')

username_elem = driver.find_elements_by_id('username')
passwd_elem = driver.find_elements_by_id('passwd')

username_elem.clear()
passwd_elem.clear()

# input username
username_elem.send_keys('username')
# input passwd
passwd_elem.send_keys('passwd')

button_elem = driver.find_elements_by_button('button')
# submit form
button_elem.submit()

上述代码执行完,Cookie相关信息就保存在driver中。

1
2
3
4
5
6
7
8
9
10
11
import requests

request = requests.Session()

# retrive Cookie from driver
cookies = driver.get_cookies()

for cookie in cookies:
request.cookies.set(cookie['name'], cookie['value'])

request.get('http://www.xxx.com/api/xxx')

上面代码发送的请求中就会携带之前登录获取到的Cookie信息,这样就可以跳过被爬站点的认证,直接访问api

Reference:

https://zhuanlan.zhihu.com/p/28587931