傅永源 傅永源
关注数: 1 粉丝数: 2 发帖数: 89 关注贴吧数: 0
JAVA爬虫大神请进来帮帮忙~跪谢 情况是这样 我再爬http://tieba.baidu.com/mo/q/checkurl?url=https%3A%2F%2Fauth.dxy.cn%2Faccounts%2Flogin&urlrefer=4bc43ffeca0add677fe2b9d41d16373f这个网站 目前进度是后面逻辑都完成了但是headers.put("Cookie", ""); 每次我都要去页面上获取cookie来这里然后执行才可以爬数据 现在需要是这样的说是不能人工干预需要自动爬 所以我需要自动登录获取cookie 但是呢我再也没上登录后获取的cookie是这样的(谷歌浏览器) Cookie:__asc=060e17db15aa2a899b0a81b001e; __auc=789c084915a59fdb8580135a791; __utmt=1; DRUGSSESSIONID=34EE460CF1254F6295136909C11520A4-n1; _gat=1; JUTE_BBS_DATA=30bbf27867b703a02b80e1ac7becfbb8aff0bce294145fdad02fedb7ae99e6381843225bafd700c0f9db4e482bdaf0f99e5130ce9b266abd077eb04920b96abbf3e93bd7bbeb3dce; _ga=GA1.2.282899927.1487568505; JUTE_SESSION_ID=dc5a17a0-ca5f-4b3b-a471-f829739ab75e; __utmt=1; __utma=17875052.282899927.1487568505.1488787707.1488787707.1; __utmb=17875052.1.10.1488787707; __utmc=17875052; __utmz=17875052.1488787707.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); CMSSESSIONID=41DC305C1614AFBEA4AF3A829576698C-n2; Hm_lvt_8a6dad3652ee53a288a11ca184581908=1487568483,1488787707; Hm_lpvt_8a6dad3652ee53a288a11ca184581908=1488787708; DXY_USER_GROUP=86; JUTE_SESSION=c5f211f2bc7d9474eff536ef071a255ddb49f6ecae2c5eddde965121920fe93fc0afa9d03d1c820140de2e1f8188c0a958ea713353a44e0948d132319d2fbd96b0e31f12a5cc75ee; __utma=129582553.633390142.1487568493.1488783828.1488787643.9; __utmb=129582553.5.10.1488787643; __utmc=129582553; __utmz=129582553.1487638797.2.2.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; Hm_lvt_d1780dad16c917088dd01980f5a2cfa7=1487725429,1488161294,1488348968,1488768294; Hm_lpvt_d1780dad16c917088dd01980f5a2cfa7=1488787718 但是呢我自己按照网上教程写了一段请求成功了返回200和ok...但是呢获取到的cookie是 5 = {BufferedHeader@1787} "Set-Cookie: route=6f2d12ae997a6e0b502275db208a27c1; Path=/" 6 = {BufferedHeader@1814} "Set-Cookie: JSESSIONID=0FFB4D352F307AC351230CD7F075E381-n2; Path=/; HttpOnly" 这样的少了非常多.我拿去试过也不能用 我对爬虫一脸蒙蔽.求大神帮帮忙.如果有可能能不能写段代码示例给我看看然后发到[email protected]最好了. 在此跪谢~
1 下一页