Bots visiting the Kaprekar Site

What is a bot/robot

A robot is an automated program that accesses a web site and traverses through the site by following the links present on the pages.

A visit by a bot on a web site is quite normal.

The most common robots on my site are search engine robots.

Robots are also referred to as Web Crawlers, or Spiders.

Baidu

Baidu.com is China's largest search engine. " Set up in 1999 in California's Silicon Valley, Beijing-based Baidu says it is China's most popular search engine, averaging tens of millions text searches a day in Chinese alone." [yahoo news]

User Agent: "Baiduspider+(+http://www.baidu.com/search/spider.htm)"
Trigger: I don't know where it got my site from!

GoogleBot

Googlebot is Google's web-crawling robot. It collects documents from the web to build a searchable index for the Google search engine.

User Agent: "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
Trigger: I submitted by site to google for indexing using their add url page.

User Agent: "Mediapartners-Google/2.1 (+http://www.googlebot.com/bot.html)"
Trigger: Got this googlebot user agent after adding Google Ads to my pages.

IBM Almaden Research Center

This robot is from the IBM Almaden Research Center. The site mentioned in the robot's use agent does not give any information of what this robot is for.

User Agent: "http://www.almaden.ibm.com/cs/crawler [c01]"
Trigger: They have visited my site only once, I don't what was the trigger.

The Alexa crawler

Ok, its the Alexa crawler.

User Agent: "ia_archiver"
Trigger: I submitted my site to be archived, by using their add url link.

iSiloXC

iSiloX is the desktop application that converts content to the iSilo™ 3.x/4.x document format, enabling you to carry that content on your Palm OS® PDA, Pocket PC PDA, Windows® CE Handheld PC, or Windows® computer for viewing using iSilo™.

iSiloXC is the command-line version of iSiloX.

User Agent: "iSiloXC/4.01 Windows/32"

Larbin

Larbin is a web crawler. It was initially developed for the XYLEME project in the VERSO team at INRIA.

User Agent: "larbin_2.6.3 larbin2.6.3@unspecified.mail"

ScanSoft

ScanSoft.com has no information on this robot. Its supposedly part of a research project at ScanSoft.

User Agent: "lmspider lmspider@scansoft.com"

Ask Jeeves/Teoma

The web crawler for the Teoma.com search engine.

User Agent: "Mozilla/2.0 (compatible; Ask Jeeves/Teoma)"

Donut

Is this a bot? No idea!

User Agent: "Mozilla/4.0 (compatible; Donut : L 15; Windows 98;)"

P2P Crawler

This is the GPU Distributed Search Engine crawler.

User Agent: "Mozilla/4.0 (compatible; GPU p2p crawler)"

Girafabot

User Agent: "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; Girafabot; girafabot at girafa dot com; http://www.girafa.com)"

Freshmeat URL Validator

User Agent: "Mozilla/5.0 (compatible; fmII URL validator/1.1)"
Trigger: Used by freshmeat.net to validate the URLs I submit for my project entries there.

Yahoo! Slurp

User Agent: "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
Trigger: I submitted my site to their directory.

MSNBot

MSNBot is a prototype web-crawling robot developed by MSN Search (http://search.msn.com/).

User Agent: "msnbot/0.11 (+http://search.msn.com/msnbot.htm)"
Trigger: I don't know how they heard of my site, but I don't mind!

W3C Markup Validation Service

User Agent: "W3C_Validator/1.305.2.137 libwww-perl/5..79"
Trigger: Agent for the W3C Markup Validation Service. Triggered whenever anyone (mostly me) tries to check if my site is XHTML valid.

W3C Link Checker

User Agent: "W3C-checklink/3.9.2 [3.17] libwww-perl/5.79"
Trigger: Agent for the W3C Link Checker page. Triggered whenever anyone (I!!) tries to check the links on my site.

World Wide Web Offline Explorer

The wwwoffle program is a simple proxy server with special features for use with dial-up internet links. It lets you browse web pages and read them without having to remain connected.

There has been only one hit registered using this user agent on my site.

User Agent: "WWWOFFLE/2.7e"
Trigger: Anyone accessing my site using the wwwoffle program.