当前位置:网站首页>Reasons for automatic allocation failure of crawler agent IP

Reasons for automatic allocation failure of crawler agent IP

2022-06-24 06:21:00 User 6172015

Recently, a friend found a problem when using crawler agent , After the request is made through the crawler agent , Not every HTTP Request automatic assignment of different agents IP, Instead, all requests remain the same proxy IP Fixed use 20 Seconds later , Will switch to a new agent IP, What is the cause of this ? Some codes provided by small partners are as follows :

    #! -*- encoding:utf-8 -*-

    import requests
    import random

    #  Target page to visit 
    targetUrl = "http://httpbin.org/ip"

    #  Objectives to visit HTTPS page 
    # targetUrl = "https://httpbin.org/ip"

    #  proxy server ( The product's official website  www.16yun.cn)
    proxyHost = "t.16yun.cn"
    proxyPort = "31111"

    #  Proxy authentication information 
    proxyUser = "username"
    proxyPass = "password"

    proxyMeta = "http://%(user)s:%(pass)[email protected]%(host)s:%(port)s" % {
        "host" : proxyHost,
        "port" : proxyPort,
        "user" : proxyUser,
        "pass" : proxyPass,
    }

    #  Set up  http and https All visits are made with HTTP agent 
    proxies = {
        "http"  : proxyMeta,
        "https" : proxyMeta,
    }


    #   Set up IP Switch head 
    tunnel = random.randint(1,10000)
    
    
    headers = {
        ‘Connection’:'keep-alive',
        'Accept-Language':'zh',
        "Proxy-Tunnel": str(tunnel)
    }

    for i in range(100):
        resp = requests.get(targetUrl, proxies=proxies, headers=headers)
        print resp.status_code
        print resp.text
        time.sleep(0.2)

After debugging and Analysis , The above code is mainly two problems :

1、‘Connection’:'keep-alive' Need to be closed

keep-alive It is the protocol specification of client and server , Turn on keep-alive, Then the server returns response Do not close after TCP Connect , After receiving the response message , The client does not close the connection , Send next HTTP The connection is reused when requested , This is the guide TCP Links keep opening , Therefore, the automatic of crawler agent IP The switch fails . Cause an agent IP It will be used for a long time , Until the agent IP Effective time of 20 After the second expires , closed TCP Connect and switch to the new agent IP.

2、tunnel Parameter setting error

tunnel Is used to control the agent IP Switching control parameters . The crawler agent will check tunnel The numerical , Different values will HTTP Request random assignment of a new agent IP forward ,tunnel The same will HTTP Request to assign the same agent IP forward . So to achieve each HTTP Requests go through different agents IP forward , Should be in for The following implementation tunnel = random.randint(1,10000), Make sure every time HTTP In the request tunnel Are different values .

原网站

版权声明
本文为[User 6172015]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/07/20210722190024310s.html