Bots visiting the Kaprekar Site
What is a bot/robot
A robot is an automated program that accesses a web site and traverses through the site by following the links present on the pages.
A visit by a bot on a web site is quite normal.
The most common robots on my site are search engine robots.
Robots are also referred to as Web Crawlers, or Spiders.
Baidu
Baidu.com is China's largest search engine. " Set up in 1999 in California's Silicon Valley, Beijing-based Baidu says it is China's most popular search engine, averaging tens of millions text searches a day in Chinese alone." [yahoo news]
User Agent: "Baiduspider+(+http://www.baidu.com/search/spider.htm)"
Trigger: I don't know where it got my site from!
GoogleBot
Googlebot is Google's web-crawling robot. It collects documents from the web to build a searchable index for the Google search engine.
User Agent: "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
Trigger: I submitted by site to google for indexing using their add url page.
User Agent: "Mediapartners-Google/2.1 (+http://www.googlebot.com/bot.html)"
Trigger: Got this googlebot user agent after adding Google Ads to my pages.
IBM Almaden Research Center
This robot is from the IBM Almaden Research Center. The site mentioned in the robot's use agent does not give any information of what this robot is for.
User Agent: "http://www.almaden.ibm.com/cs/crawler [c01]"
Trigger: They have visited my site only once, I don't what was the trigger.
The Alexa crawler
Ok, its the Alexa crawler.
User Agent: "ia_archiver"
Trigger: I submitted my site to be archived, by using their add url link.
iSiloXC
iSiloX is the desktop application that converts content to the iSilo™ 3.x/4.x document format, enabling you to carry that content on your Palm OS® PDA, Pocket PC PDA, Windows® CE Handheld PC, or Windows® computer for viewing using iSilo.
iSiloXC is the command-line version of iSiloX.
User Agent: "iSiloXC/4.01 Windows/32"
Larbin
Larbin is a web crawler. It was initially developed for the XYLEME project in the VERSO team at INRIA.
User Agent: "larbin_2.6.3 larbin2.6.3@unspecified.mail"
ScanSoft
ScanSoft.com has no information on this robot. Its supposedly part of a research project at ScanSoft.
User Agent: "lmspider lmspider@scansoft.com"
Ask Jeeves/Teoma
The web crawler for the Teoma.com search engine.
User Agent: "Mozilla/2.0 (compatible; Ask Jeeves/Teoma)"
Donut
Is this a bot? No idea!
User Agent: "Mozilla/4.0 (compatible; Donut : L 15; Windows 98;)"
P2P Crawler
This is the GPU Distributed Search Engine crawler.
User Agent: "Mozilla/4.0 (compatible; GPU p2p crawler)"
Girafabot
User Agent: "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; Girafabot; girafabot at
girafa dot com; http://www.girafa.com)"
Freshmeat URL Validator
User Agent: "Mozilla/5.0 (compatible; fmII URL validator/1.1)"
Trigger: Used by freshmeat.net to validate the URLs I submit for my project entries there.
Yahoo! Slurp
User Agent: "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
Trigger: I submitted my site to their directory.
MSNBot
MSNBot is a prototype web-crawling robot developed by MSN Search (http://search.msn.com/).
User Agent: "msnbot/0.11 (+http://search.msn.com/msnbot.htm)"
Trigger: I don't know how they heard of my site, but I don't mind!
W3C Markup Validation Service
User Agent: "W3C_Validator/1.305.2.137 libwww-perl/5..79"
Trigger: Agent for the W3C Markup Validation Service. Triggered whenever anyone (mostly me) tries to check if my site is XHTML valid.
W3C Link Checker
User Agent: "W3C-checklink/3.9.2 [3.17] libwww-perl/5.79"
Trigger: Agent for the W3C Link Checker page. Triggered whenever anyone (I!!) tries to check the links on my site.
World Wide Web Offline Explorer
The wwwoffle program is a simple proxy server with special features for use with dial-up internet links. It lets you browse web pages and read them without having to remain connected.
There has been only one hit registered using this user agent on my site.
User Agent: "WWWOFFLE/2.7e"
Trigger: Anyone accessing my site using the wwwoffle program.