其实还有一种更简单的方式,但是因为我没搞懂原理,所以搞懂了再写吧。
源代码(Python实现)
用Python绕过有道翻译的反爬虫措施,调用翻译接口,实现单词和短句的翻译和用法解析。效果如图所示。
话不多说先上代码。
import requests
import random
import hashlib
import time
def salt_sign(e):
navigator_appVersion = "5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
t = hashlib.md5(navigator_appVersion.encode("utf-8")).hexdigest()
r = str(int(time.time() * 1000))
i = r + str(random.randint(1, 10))
return {
"ts": r,
"bv": t,
"salt": i,
"sign": hashlib.md5(str("fanyideskweb" + e + i + "Ygy_4c=r#e#4EX^NUGUc5").encode("utf-8")).hexdigest()
}
def translate(word):
url = 'http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'
r = salt_sign(word)
data = {
"i": word,
"from": "AUTO",
"to": "AUTO",
"smartresult": "dict",
"client": "fanyideskweb",
"salt": r["salt"],
"sign": r["sign"],
"lts": r["ts"],
"bv": r["bv"],
"doctype": "json",
"version": "2.1",
"keyfrom": "fanyi.web",
"action": "FY_BY_REALTlME"
}
headers = {
"Cookie": "OUTFOX_SEARCH_USER_ID=-286220249@10.108.160.17;",
"Referer": "http://fanyi.youdao.com/",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36",
}
res = requests.post(url=url, data=data, headers=headers).json()
print(word + "的译文:" + res['translateResult'][0][0]['tgt'])
print("翻译类型:" + res['type'])
print("用法(一):" + res['smartResult']['entries'][0])
print("用法(二):" + res['smartResult']['entries'][1])
if __name__ == '__main__':
while True:
try:
word = input("请输入你要翻译的单词或短句:")
translate(word)
except Exception as e:
print("错误:", e)
实现过程
寻找接口
目标网址:
随便翻译一个单词,F12进入控制台,选择网络,选择XHR
查看,很快就发现了一个接口。
查看发送请求的表单数据,是用json
传递的数据,我们就可以用Python发送请求。实现很简单。
import requests
def translate(word):
url = 'http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'
data = {
"i": word,
"from": "AUTO",
"to": "AUTO",
"smartresult": "dict",
"client": "fanyideskweb",
"salt": "16632096342368",
"sign": "f68df14c3fd6c01e6820cd3ffd826e55",
"lts": "1663209634236",
"bv": "47edca4d7e6ec9bf4fca7156ea36b8ef",
"doctype": "json",
"version": "2.1",
"keyfrom": "fanyi.web",
"action": "FY_BY_REALTlME"
}
headers = {
"Cookie": "OUTFOX_SEARCH_USER_ID=-286220249@10.108.160.17;",
"Referer": "http://fanyi.youdao.com/",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36",
}
res = requests.post(url=url, data=data, headers=headers).json()
print(res)
if __name__ == '__main__':
while True:
try:
word = input("请输入你要查询的字符串:")
translate(word)
except Exception as e:
print("错误:", e)
随意输入一个单词,居然报错了,怎会如此?
破解反爬措施
还是来分析一下表单数据吧:
data = {
"i": word,
"from": "AUTO",
"to": "AUTO",
"smartresult": "dict",
"client": "fanyideskweb",
"salt": r["salt"],
"sign": r["sign"],
"lts": r["ts"],
"bv": r["bv"],
"doctype": "json",
"version": "2.1",
"keyfrom": "fanyi.web",
"action": "FY_BY_REALTlME"
}
改变输入的单词,发现salt
、sign
、lts
、bv
每次都变化,而且很容易看出来sign
和salt
经过md5加密。这可能是有道设置的反爬措施,和之前写过的token很类似。
我们来分析一下有道翻译的网页源代码,找一下这几个数据是怎么生成的。
发现了一个名为fanyi.min.js的脚本,猜测这四个数据可能就是这个脚本生成的。打开,密密麻麻一大堆还没有格式化。
搜索一下关于sign
和salt
等的代码部分。找到了,格式化一下得到了下面这部分关键代码。
var r = function (e) {
var t = n.md5(navigator.appVersion),
r = "" + (new Date)
.getTime(),
i = r + parseInt(10 * Math.random(), 10);
return {
ts: r,
bv: t,
salt: i,
sign: n.md5("fanyideskweb" + e + i + "Ygy_4c=r#e#4EX^NUGUc5")
}
};
t.recordUpdate = function (e) {
var t = e.i,
i = r(t);
n.ajax({
type: "POST",
contentType: "application/x-www-form-urlencoded; charset=UTF-8",
url: "/bettertranslation",
data: {
i: e.i,
client: "fanyideskweb",
salt: i.salt,
sign: i.sign,
lts: i.ts,
bv: i.bv,
tgt: e.tgt,
modifiedTgt: e.modifiedTgt,
from: e.from,
to: e.to
},
success: function (e) { },
error: function (e) { }
})
}, t.recordMoreResultLog_get = function (e) {
n.ajax({
type: "POST",
contentType: "application/x-www-form-urlencoded; charset=UTF-8",
url: "/ctlog",
data: {
i: e.i,
action: "GET_MORE_TRANSLATION",
from: e.from,
to: e.to
},
success: function (e) { },
error: function (e) { }
})
}, t.recordMoreResultLog_choose = function (e) {
n.ajax({
type: "POST",
contentType: "application/x-www-form-urlencoded; charset=UTF-8",
url: "/ctlog",
data: {
i: e.i,
tgt: e.tgt,
systemName: e.systemName,
pos: e.pos,
action: "SELECT_OTHER_TRANSLATION",
from: e.from,
to: e.to
},
success: function (e) { },
error: function (e) { }
})
};
后面就很简单了,分析一下这段代码。总结一下表单各参数的释义。
i:需要进行翻译的字符串
from:被翻译语言的语种
to:翻译后的语言的语种
smartresult:智能结果,固定值
client:客户端,固定值
salt:加密用到的盐值,待定
sign:签名字符串,待定
lts:毫秒时间戳
bv:未知的md5值,固定值
doctype:文档类型,固定值
version:版本,固定值
keyfrom:键来源,固定值
action:操作动作,固定值
根据源代码分析salt
、sign
、lts
、bv
的规律。
# 最关键代码
var r = function (e) {
var t = n.md5(navigator.appVersion),
r = "" + (new Date)
.getTime(),
i = r + parseInt(10 * Math.random(), 10);
return {
ts: r,
bv: t,
salt: i,
sign: n.md5("fanyideskweb" + e + i + "Ygy_4c=r#e#4EX^NUGUc5")
}
};
-
salt:当前毫秒时间戳与10以内随机数字字符串的拼接
-
sign:"fanyideskweb"+i+salt+"Ygy_4c=r#e#4EX^NUGUc5"的md5值
-
ts:当前毫秒时间戳
-
bv: 浏览器版本md5值
所以绕过就很简单了。这是获取salt
、sign
、lts
、bv
四个值的函数。代码实现很简单,就不多解释了。
def salt_sign(e):
navigator_appVersion = "5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
t = hashlib.md5(navigator_appVersion.encode("utf-8")).hexdigest()
r = str(int(time.time() * 1000))
i = r + str(random.randint(1, 10))
return {
"ts": r,
"bv": t,
"salt": i,
"sign": hashlib.md5(str("fanyideskweb" + e + i + "Ygy_4c=r#e#4EX^NUGUc5").encode("utf-8")).hexdigest()
}
格式化返回的JSON数据
返回的数据是JSON格式的,既然是实现翻译功能,当然需要解析一下这个数据了,更好看懂。
{
'errorCode': 0,
'translateResult': [[{'tgt': '你好', 'src': 'hello'}]],
'type': 'en2zh-CHS',
'smartResult': {'entries': ['', 'int. 喂,你好(用于问候或打招呼);喂,你好(打电话时的招呼语);喂,你好(引起别人注 意的招呼语);<非正式>喂,嘿 (认为别人说了蠢话或分心);<英,旧>嘿(表示惊讶)\r\n', 'n. 招呼,问候;(Hello)(法、印、 美、俄)埃洛(人名)\r\n', 'v. 说(或大声说)“喂”;打招呼\r\n'], 'type': 1}
}
解析如下:
res = requests.post(url=url, data=data, headers=headers).json()
print(word + "的译文:" + res['translateResult'][0][0]['tgt'])
print("翻译类型:" + res['type'])
print("用法(一):" + res['smartResult']['entries'][0])
print("用法(二):" + res['smartResult']['entries'][1])
最终代码
import requests
import random
import hashlib
import time
def salt_sign(e):
navigator_appVersion = "5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
t = hashlib.md5(navigator_appVersion.encode("utf-8")).hexdigest()
r = str(int(time.time() * 1000))
i = r + str(random.randint(1, 10))
return {
"ts": r,
"bv": t,
"salt": i,
"sign": hashlib.md5(str("fanyideskweb" + e + i + "Ygy_4c=r#e#4EX^NUGUc5").encode("utf-8")).hexdigest()
}
def translate(word):
url = 'http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'
r = salt_sign(word)
data = {
"i": word,
"from": "AUTO",
"to": "AUTO",
"smartresult": "dict",
"client": "fanyideskweb",
"salt": r["salt"],
"sign": r["sign"],
"lts": r["ts"],
"bv": r["bv"],
"doctype": "json",
"version": "2.1",
"keyfrom": "fanyi.web",
"action": "FY_BY_REALTlME"
}
headers = {
"Cookie": "OUTFOX_SEARCH_USER_ID=-286220249@10.108.160.17;",
"Referer": "http://fanyi.youdao.com/",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36",
}
res = requests.post(url=url, data=data, headers=headers).json()
print(word + "的译文:" + res['translateResult'][0][0]['tgt'])
print("翻译类型:" + res['type'])
print("用法(一):" + res['smartResult']['entries'][0])
print("用法(二):" + res['smartResult']['entries'][1])
if __name__ == '__main__':
while True:
try:
word = input("请输入你要翻译的单词或短句:")
translate(word)
except Exception as e:
print("错误:", e)
我的博客即将同步至腾讯云开发者社区,邀请大家一同入驻:https://cloud.tencent.com/developer/support-plan?invite_code=35w4tlmp1aec4