当前位置:网站首页>A scheme for crawlers to collect public opinion data
A scheme for crawlers to collect public opinion data
2022-06-24 12:33:00 【User 6172015】
A web crawler simply refers to a web site that is accessed through a crawler program API Connect to get data information . The crawler program can crawl the required data information from the web page , Then save it in the new document . The web crawler supports the collection of various data , file , picture . Video and so on can be collected , But you can't collect illegal business . In the era of Internet big data , Web crawler is mainly for search engines to provide the most comprehensive and up-to-date data , A web crawler is also a crawler program that collects data from the Internet .
We can also collect public opinion data through web crawlers , You can collect news , social contact , Forum , Blog and other information data . This is also one of the common public opinion data acquisition schemes . Generally, the crawler agent is used through the crawler program IP Data collection for some meaningful websites . Public opinion data can also be purchased through the data trading market , Or find those professional public opinion analysis teams to obtain , But generally speaking , Professional public opinion analysis team , They also use agents through crawlers IP To collect relevant data , So as to analyze public opinion data .
Due to the popularity of short videos , Tiktok , Kwai these two mainstream short videos APP, We can also use the crawler program to collect Tiktok , Kwai conducts public opinion data analysis . Generate the statistical data into tables , It is provided to you as a data report , You can also refer to the following acquisition scheme codes :
// Target page to visit
string targetUrl = "http://httpbin.org/ip";
// proxy server ( The product's official website www.16yun.cn)
string proxyHost = "http://t.16yun.cn";
string proxyPort = "31111";
// Proxy authentication information
string proxyUser = "username";
string proxyPass = "password";
// Setting up a proxy server
WebProxy proxy = new WebProxy(string.Format("{0}:{1}", proxyHost, proxyPort), true);
ServicePointManager.Expect100Continue = false;
var request = WebRequest.Create(targetUrl) as HttpWebRequest;
request.AllowAutoRedirect = true;
request.KeepAlive = true;
request.Method = "GET";
request.Proxy = proxy;
//request.Proxy.Credentials = CredentialCache.DefaultCredentials;
request.Proxy.Credentials = new System.Net.NetworkCredential(proxyUser, proxyPass);
// Set up Proxy Tunnel
// Random ran=new Random();
// int tunnel =ran.Next(1,10000);
// request.Headers.Add("Proxy-Tunnel", String.valueOf(tunnel));
//request.Timeout = 20000;
//request.ServicePoint.ConnectionLimit = 512;
//request.UserAgent = "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.82 Safari/537.36";
//request.Headers.Add("Cache-Control", "max-age=0");
//request.Headers.Add("DNT", "1");
//String encoded = System.Convert.ToBase64String(System.Text.Encoding.GetEncoding("ISO-8859-1").GetBytes(proxyUser + ":" + proxyPass));
//request.Headers.Add("Proxy-Authorization", "Basic " + encoded);
using (var response = request.GetResponse() as HttpWebResponse)
using (var sr = new StreamReader(response.GetResponseStream(), Encoding.UTF8))
{
string htmlStr = sr.ReadToEnd();
}边栏推荐
- What is the reason why the video intelligent analysis platform easycvr is locally controllable but the superior equipment cannot control the subordinate equipment?
- 数据标注科普:十种常见的图像标注方法
- Easynvr user login is modified to share the modification process of ip+ user name restriction
- Examples of AES and RSA encryption operations implemented by php7.1
- Single gene pan cancer + simple experiment can be published 7 points+
- Realization of alarm clock with AHK
- Opencv learning notes - cv:: mat class
- 文本转语音功能上线,可以体验专业播音员的服务,诚邀试用
- Is it safe to apply for new bonds to open an account
- Opencv learning notes - Discrete Fourier transform
猜你喜欢

Opencv learning notes - loading and saving images
![[live review] battle code pioneer phase 7: how third-party application developers contribute to open source](/img/fa/e52bd8a1a404a759ef6ba88e8da0f0.png)
[live review] battle code pioneer phase 7: how third-party application developers contribute to open source

GTEST from getting started to getting started

GTest从入门到入门

Opencv learning notes -- Separation of color channels and multi-channel mixing
[Architect (Part 41)] installation of server development and connection to redis database

Opencv learning notes - Discrete Fourier transform

New progress in the construction of meituan's Flink based real-time data warehouse platform

Install Kali on the U disk and persist it

ArrayList # sublist these four holes, you get caught accidentally
随机推荐
Is it safe to open an account under the conditions of new bonds
QT -- the qtabwidget supports dragging tabbar items
Data stack technology sharing: open source · data stack - extend flinksql to realize the join of flow and dimension tables
Google hacking search engine attack and Prevention
Pinduoduo press the user accelerator key
A good habit that makes your programming ability soar
About Adobe Photoshop adjusting selection
Database migration tool flyway vs liquibase (II)
Insurance app aging service evaluation analysis 2022 issue 06
Identification of new prognostic DNA methylation features in uveal melanoma by 11+ based on methylation group and transcriptome analysis~
深圳市人民医院程立新课题组提出多组学数据在肝细胞癌的诊断与预后分析的新方法meGPS
Opencv learning notes - loading and saving images
Realization of alarm clock with AHK
In depth analysis, from ordinary clock system to various time service modes
2021-06-02: given the head node of a search binary tree, it will be transformed into an ordered two-way linked list with head and tail connected.
广发证券靠谱吗?开证券账户安全吗?
文本转语音功能上线,可以体验专业播音员的服务,诚邀试用
How to write controller layer code gracefully?
Practice of dynamic load balancing based on open source tars
Flink snapshot analysis: operators for locating large states and data skew