Here’s a summary of the steps to prevent stealing content from your site.

Block their robot
Add the following lines to your robots.txt (in the “root” folder of your website)

User-agent: AboutUsBot
Disallow: /

Block their IP range
In your .htaccess file (if you’re on Apache) add the following lines:

deny from 66.249.16.

It appears that they now respect Robots.txt, and as Boris pointed out, there are some useful services in that address space.

Block the bot’s user agent
If you do user agent blocking, block the bot’s user agent:

Mozilla/5.0 (compatible; AboutUsBot/0.9; +

Block the IP Range
AboutUs.Org uses Domaintools services to generate thumbnail images of site content, so block their IP range too:

deny from 66.249.4.

(I’ve actually blocked the entire 66.249.x.x address space, just to be safe!)

Unfortunately, these steps are only any good if you’ve not already been indexed, there is one thing you can try if you’ve been indexed and had clearly copyrighted content stolen, and that’s contact his upstream host.

Report the site to his upstream host

Once I get a confirmation that they are the upstream provider involved, I’ll be recommending that if you’ve had your content stolen by AboutUs.Org, that you contact this upstream host.


5 thoughts on “Blocking

  1. Tom,

    >(I’ve actually blocked the entire 66.249.x.x address space, just to be safe!)

    You’re blocking some Google-Boxes too. e.g., which crawls my site frequently.

    Even without getting hit by the AboutUsBot, you get listed. It seems, that it uses the
    The entry from is OK to me, but I’ll block their Bot anyway to prevent them from gathering more information.


  2. >

    Ok, after reading their Bot-FAQ I noticed, that the information comes from Alexa.
    But I don’t know how they build the screenshot of my site. Alexa has none and their Bot did not visit me.


  3. Thanks Boris

    As they now claim to respect Robots.txt files, I’m going to remove the range block – especailly in light of the fact that the Googlebot is using a similar address space.

  4. I think people who want to block are luddites. About six or seven years ago a lot of people were up in arms about google too. But I guess their sites aren’t around much these days to see are they?

    I’ve deleted the URL from this comment as it served no purpose other than to generate pagerank – the linked page was a commercial organisation. This blog is not a source of free advertising

Comments are closed.