This site cannot survive without advertising. ☹️
Cutting the crap with robots.txt
Cutting the crap with robots.txt
You are probably here because your server is being hammered by irrelevant robots scanning your the web pages on your server. These pests can dramatically reduce your servers performance and increase the load average figures. The effect of this is to introduce delays in serving pages to your customers, the people who you want to see visiting your pages. Often this results in losing traffic and also AdSense revenue.
From a bit of hunting around I've found a pretty good set of rules that should theoretically block these pests. I can't say they will all obey these rules, but at least you have the name of the user agent you need to block.
I hope you find it useful!
This short article assumes you understand how to use robots.txt - its function is to provide you with a broad set of rules to block these nuisances
# Adbeat ads User-agent: adbeat_bot Disallow: / #AgentLinkSpammer User-agent: AgentLinkSpammer Disallow: / # AhrefsBot ads User-agent: AhrefsBot Disallow: / User-agent: AhrefsBot/4.0 Disallow: / #aiHitBot Ukraine or Russia User-agent: aiHitBot Disallow: / User-agent: aiHitBot/1.0 Disallow: / User-agent: aiHitBot/1.1 Disallow: / #Acoon Germany User-agent: Acoon Disallow: / #Arachmo Japan User-agent: Arachmo Disallow: / #Baiduspider China and Japan User-agent: Baiduspider Disallow: / User-agent: Baiduspider+ Disallow: / User-agent: Baiduspider+(+http://www.baidu.com/search/spider.htm) Disallow: / User-agent: Baiduspider/2.0;+http://www.baidu.com/search/spider.html Disallow: / User-agent: Baiduspider/2.0 Disallow: / User-agent: +Baiduspider Disallow: / User-agent: +Baiduspider/2.0 Disallow: / User-agent: +Baiduspider/2.0;++http://www.baidu.com/search/spider.html Disallow: / User-agent: Mozilla/5.0(compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html) Disallow: / #careerbot Germany User-agent: careerbot Disallow: / #COMODOSpider/Nutch-1.2 United Kingdom User-agent: COMODOSpider/Nutch-1.2 Disallow: / #EasouSpider - China User-agent: EasouSpider Disallow: / #Exabot/3.0 - France proxy scraper User-agent: Exabot/3.0 Disallow: / #Exalead proxy scraper France User-agent: Exalead Disallow: / User-agent: ExaLead Crawler Disallow: / #Ezooms and dotbot User-agent: ezooms Disallow: / User-agent: Ezooms/1.0 Disallow: / User-agent: DotBot Disallow: / User-agent: Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot[at]gmail[dot]com) Disallow: / #findlinks/2.6 Germany http://wortschatz.uni-leipzig.de/findlinks User-agent: findlinks/2.6 Disallow: / #Java/1.6.0_04 User-agent: Java/1.6.0_04 Disallow: / #JikeSpider China User-agent: JikeSpider Disallow: / #KaloogaBot Netherlands contextual advertising User-agent: KaloogaBot Disallow: / #Mail.RU_Bot/2.0 Russia User-agent: Mail.RU_Bot/2.0 Disallow: / #Mail.RU Russia User-agent: Mail.RU Disallow: / #Mail.Ru Russia User-agent: Mail.Ru Disallow: / User-agent: Mail.RU_Bot/2.0; +http://go.mail.ru/help/robots Disallow: / #MJ12bot United Kingdom User-Agent: MJ12bot Disallow: / #MJ12bot/v1.4.3 United Kingdon User-Agent: MJ12bot/v1.4.3 Disallow: / User-agent: moget Disallow: / #Ichiro Japan User-agent: Ichiro Disallow: / #Ichiro 3.0 Japan User-agent: Ichiro 3.0 Disallow: / User-agent: NaverBot Disallow: / User-agent: Yeti Disallow: / #NetcraftSurveyAgent/1.0 User-agent: NetcraftSurveyAgent/1.0 Disallow: / #OpenWebIndex/Nutch-1.6 Germany User-agent: OpenWebIndex/Nutch-1.6 Disallow: / User-agent: OpenWebIndex Disallow: / #panoptaStudyBot checks.panopta.com monitor User-agent: panoptaStudyBot Disallow: / #panoptaStudyBot checks.panopta.com monitor User-agent: checks.panopta.com Disallow: / #picsearch Sweden searches for pictures User-agent: psbot Disallow: / #plukkie Dutch (botje.nl)/Belgium (botje.be)/France (botje.fr)/United Kingdom (botje.co.uk) search engine User-agent: plukkie Disallow: / #SeznamBot Czech Republic User-agent: SeznamBot Disallow: / User-agent: SeznamBot/1.0 Disallow: / User-agent: SeznamBot/1.1 Disallow: / #SeznamBot/3.0 User-agent: SeznamBot/3.0 Disallow: / #SistrixCrawler Germany DE User-agent: SistrixCrawler Disallow: / User-agent: Sistrix Disallow: / User-agent: SISTRIX Crawler Disallow: / User-agent: SISTRIX Disallow: / # Sogou User-agent: sogou spider Disallow: / User-agent: Sogou web spider Disallow: / # Sosospider - China http://help.soso.com/webspider.htm User-agent: Sosospider+ Disallow: / # Sosospider - China User-agent: Sosospider Disallow: / #Sosospider/2.0 - China may not obey robots.txt User-agent: Sosospider/2.0 Disallow: / #360Spider China User-agent: 360Spider Disallow: / #SurveyBot User-agent: SurveyBot Disallow: / #Wada.vn Vietnamese Search/2.1 User-agent: Wada.vn Disallow: / User-agent: Wada.vn Vietnamese Search Disallow: / User-agent: Wada.vn Vietnamese Search/2.1 Disallow: / #Yandex User-agent: Yandex Disallow: / User-agent: Yandex/1.01.001 Disallow: / User-agent: YandexBot/3.0-MirrorDetector Disallow: / User-agent: YandexImages/3.0 Disallow: / User-agent: YandexSomething/1. Disallow: / User-agent: Yandex.com Disallow: / User-agent: YandexBot/3.0 Disallow: / #YisouSpider China User-agent: YisouSpider Disallow: / #YoudaoBot/1.0 China User-agent: YoudaoBot/1.0 Disallow: / #YoudaoBot China User-agent: YoudaoBot/1.0 Disallow: / #Zao - Japan User-agent: Zao Disallow: /