当前位置:网站首页>Apache, IIS6 and ii7 independent IP hosts screen and intercept spider crawling (applicable to VPS virtual machine servers)

Apache, IIS6 and ii7 independent IP hosts screen and intercept spider crawling (applicable to VPS virtual machine servers)

2022-06-28 03:04:00 wwwwestcn

If it is a normal search engine spider access , It is not recommended to ban spiders , Otherwise, the collection and ranking of the website in Baidu and other search engines will be lost , Causing losses such as loss of customers . You can give priority to upgrading the virtual host model to get more traffic or upgrade to Cloud server ( Its unlimited ). For more details, please visit : http://www.west.cn/faq/list.asp?unid=626 

  

1.  Use the web site administration assistant environment :http://www.west.cn/faq/list.asp?unid=650  Refer to this instruction to enable setting pseudo static components

2.  windows2003+iis Manual station building environment :http://www.west.cn/faq/list.asp?unid=639  Refer to this instruction to load pseudo static components                  

3.   Then configure in the configuration file according to the following system rules

Linux Next Rules file .htaccess( Create... By hand .htaccess File to the site root directory )

<IfModule mod_rewrite.c>
RewriteEngine On
#Block spider
RewriteCond %{HTTP_USER_AGENT}   "SemrushBot|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|curl|perl|Python|Wget|Xenu|ZmEu"   [NC]
RewriteRule !(^robots\.txt$) - [F]
</IfModule>

windows2003 Next Rules file httpd.conf 

#Block spider
RewriteCond %{HTTP_USER_AGENT}   (SemrushBot|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|curl|perl|Python|Wget|Xenu|ZmEu)   [NC]
RewriteRule !(^/robots.txt$) - [F]

windows2008 Next web.config

<?xml version="1.0" encoding="UTF-8"?>
  <configuration>
      <system.webServer>
       <rewrite>  
         <rules>         
<rule name="Block spider">
      <match url="(^robots.txt$)"   ignoreCase="false" negate="true" />
      <conditions>
        <add   input="{HTTP_USER_AGENT}"   pattern="SemrushBot|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|curl|perl|Python|Wget|Xenu|ZmEu"   ignoreCase="true" />
      </conditions>
      <action   type="AbortRequest" />
</rule>
        </rules>  
        </rewrite>  
       </system.webServer>
  </configuration>

Nginx Corresponding shielding rules

The code needs to be added to the corresponding site configuration file server In segment

if ($http_user_agent ~ "Bytespider|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$"   )
{
  return 444;
}

notes : The default shielding part of the rule is unknown spiders , To shield other spiders, add them according to the rules

Link to the original text :https://www.west.cn/faq/list.asp?unid=820

原网站

版权声明
本文为[wwwwestcn]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/179/202206280121129168.html