当前位置:网站首页>A scheme for crawlers to collect public opinion data
A scheme for crawlers to collect public opinion data
2022-06-24 12:33:00 【User 6172015】
A web crawler simply refers to a web site that is accessed through a crawler program API Connect to get data information . The crawler program can crawl the required data information from the web page , Then save it in the new document . The web crawler supports the collection of various data , file , picture . Video and so on can be collected , But you can't collect illegal business . In the era of Internet big data , Web crawler is mainly for search engines to provide the most comprehensive and up-to-date data , A web crawler is also a crawler program that collects data from the Internet .
We can also collect public opinion data through web crawlers , You can collect news , social contact , Forum , Blog and other information data . This is also one of the common public opinion data acquisition schemes . Generally, the crawler agent is used through the crawler program IP Data collection for some meaningful websites . Public opinion data can also be purchased through the data trading market , Or find those professional public opinion analysis teams to obtain , But generally speaking , Professional public opinion analysis team , They also use agents through crawlers IP To collect relevant data , So as to analyze public opinion data .
Due to the popularity of short videos , Tiktok , Kwai these two mainstream short videos APP, We can also use the crawler program to collect Tiktok , Kwai conducts public opinion data analysis . Generate the statistical data into tables , It is provided to you as a data report , You can also refer to the following acquisition scheme codes :
// Target page to visit
string targetUrl = "http://httpbin.org/ip";
// proxy server ( The product's official website www.16yun.cn)
string proxyHost = "http://t.16yun.cn";
string proxyPort = "31111";
// Proxy authentication information
string proxyUser = "username";
string proxyPass = "password";
// Setting up a proxy server
WebProxy proxy = new WebProxy(string.Format("{0}:{1}", proxyHost, proxyPort), true);
ServicePointManager.Expect100Continue = false;
var request = WebRequest.Create(targetUrl) as HttpWebRequest;
request.AllowAutoRedirect = true;
request.KeepAlive = true;
request.Method = "GET";
request.Proxy = proxy;
//request.Proxy.Credentials = CredentialCache.DefaultCredentials;
request.Proxy.Credentials = new System.Net.NetworkCredential(proxyUser, proxyPass);
// Set up Proxy Tunnel
// Random ran=new Random();
// int tunnel =ran.Next(1,10000);
// request.Headers.Add("Proxy-Tunnel", String.valueOf(tunnel));
//request.Timeout = 20000;
//request.ServicePoint.ConnectionLimit = 512;
//request.UserAgent = "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.82 Safari/537.36";
//request.Headers.Add("Cache-Control", "max-age=0");
//request.Headers.Add("DNT", "1");
//String encoded = System.Convert.ToBase64String(System.Text.Encoding.GetEncoding("ISO-8859-1").GetBytes(proxyUser + ":" + proxyPass));
//request.Headers.Add("Proxy-Authorization", "Basic " + encoded);
using (var response = request.GetResponse() as HttpWebResponse)
using (var sr = new StreamReader(response.GetResponseStream(), Encoding.UTF8))
{
string htmlStr = sr.ReadToEnd();
}边栏推荐
- Pinduoduo press the user accelerator key
- Tsingsee green rhino video "cloud side end" +ai intelligent security system is integrated into the mainstream development trend
- 2021-06-02: given the head node of a search binary tree, it will be transformed into an ordered two-way linked list with head and tail connected.
- Tencent security monthly report - zero trust development trend forum, digital Expo Technology Award, Mercedes Benz security research results
- Opencv learning notes - regions of interest (ROI) and image blending
- Use go to process millions of requests per minute
- FreeRTOS overview and experience
- GTest从入门到入门
- JVM GC garbage collection detailed introduction quick check of learning notes
- 9+!通过深度学习从结直肠癌的组织学中预测淋巴结状态
猜你喜欢

Group planning - General Review

How stupid of me to hire a bunch of programmers who can only "Google"!

我真傻,招了一堆只会“谷歌”的程序员!
[Old Wei makes machines] issue 090: keyboard? host? Full function keyboard host!

GTEST from getting started to getting started

链接器 --- Linker

Insurance app aging service evaluation analysis 2022 issue 06
![[go language questions] go from 0 to entry 4: advanced usage of slice, elementary review and introduction to map](/img/7a/16b481753d7d57f50dc8787eec8a1a.png)
[go language questions] go from 0 to entry 4: advanced usage of slice, elementary review and introduction to map

Install Kali on the U disk and persist it

Opencv learning notes - loading and saving images
随机推荐
Conceptual analysis of DDD Domain Driven Design
What is the reason why the video intelligent analysis platform easycvr is locally controllable but the superior equipment cannot control the subordinate equipment?
Discussion on redis communication protocol
11+! Methylation modification patterns based on m6A regulatory factors in colon cancer are characterized by different tumor microenvironment immune spectra
数据标注科普:十种常见的图像标注方法
Detailed explanation of the execution order of the expression and loop body in the for loop
万名校园开发者花式玩AI,亮点看这张图就够啦!
Tencent security monthly report - zero trust development trend forum, digital Expo Technology Award, Mercedes Benz security research results
Ingenious conception - iron death regulatory factor classification and prognosis 6+
Clickhouse uses distributed join of pose series
Istio FAQ: istio init crash
Kubernetes log viewer - kubetail
What are the software prototyping tools?
How to open a new bond? Is it safe to open an account
Programmer: after 5 years in a company with comfortable environment, do you want to continue to cook frogs in warm water or change jobs?
Jenkins pipeline syntax
What are the low threshold financial products in 2022? Not much money
我在深圳,到哪里开户比较好?现在网上开户安全么?
Pinduoduo press the user accelerator key
[day ui] alert component learning