当前位置:网站首页>A scheme for crawlers to collect public opinion data

A scheme for crawlers to collect public opinion data

2022-06-24 12:33:00 User 6172015

A web crawler simply refers to a web site that is accessed through a crawler program API Connect to get data information . The crawler program can crawl the required data information from the web page , Then save it in the new document . The web crawler supports the collection of various data , file , picture . Video and so on can be collected , But you can't collect illegal business . In the era of Internet big data , Web crawler is mainly for search engines to provide the most comprehensive and up-to-date data , A web crawler is also a crawler program that collects data from the Internet .

We can also collect public opinion data through web crawlers , You can collect news , social contact , Forum , Blog and other information data . This is also one of the common public opinion data acquisition schemes . Generally, the crawler agent is used through the crawler program IP Data collection for some meaningful websites . Public opinion data can also be purchased through the data trading market , Or find those professional public opinion analysis teams to obtain , But generally speaking , Professional public opinion analysis team , They also use agents through crawlers IP To collect relevant data , So as to analyze public opinion data .

Due to the popularity of short videos , Tiktok , Kwai these two mainstream short videos APP, We can also use the crawler program to collect Tiktok , Kwai conducts public opinion data analysis . Generate the statistical data into tables , It is provided to you as a data report , You can also refer to the following acquisition scheme codes :

//  Target page to visit 
string targetUrl = "http://httpbin.org/ip";


//  proxy server ( The product's official website  www.16yun.cn)
string proxyHost = "http://t.16yun.cn";
string proxyPort = "31111";

//  Proxy authentication information 
string proxyUser = "username";
string proxyPass = "password";

//  Setting up a proxy server 
WebProxy proxy = new WebProxy(string.Format("{0}:{1}", proxyHost, proxyPort), true);


ServicePointManager.Expect100Continue = false;

var request = WebRequest.Create(targetUrl) as HttpWebRequest;

request.AllowAutoRedirect = true;
request.KeepAlive = true;
request.Method    = "GET";
request.Proxy     = proxy;

//request.Proxy.Credentials = CredentialCache.DefaultCredentials;

request.Proxy.Credentials = new System.Net.NetworkCredential(proxyUser, proxyPass);

//  Set up Proxy Tunnel
// Random ran=new Random();
// int tunnel =ran.Next(1,10000);
// request.Headers.Add("Proxy-Tunnel", String.valueOf(tunnel));


//request.Timeout = 20000;
//request.ServicePoint.ConnectionLimit = 512;
//request.UserAgent = "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.82 Safari/537.36";
//request.Headers.Add("Cache-Control", "max-age=0");
//request.Headers.Add("DNT", "1");


//String encoded = System.Convert.ToBase64String(System.Text.Encoding.GetEncoding("ISO-8859-1").GetBytes(proxyUser + ":" + proxyPass));
//request.Headers.Add("Proxy-Authorization", "Basic " + encoded);

using (var response = request.GetResponse() as HttpWebResponse)
using (var sr = new StreamReader(response.GetResponseStream(), Encoding.UTF8))
{
    string htmlStr = sr.ReadToEnd();
}
原网站

版权声明
本文为[User 6172015]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/05/20210531191352495v.html