Wowee. Ever since I used a blogging package to handle all the fancy blogging features for me (RSS feeds, comments, auto-archiving, stats, etc.) I have come to the realization that if a medium can be spammed, it will be spammed. It's really disgusting the stuff I have to block from my blog every 2-3 days.
The stats on the right-hand side (top referers, recent referers) are auto-generated. When someone visits my site by clicking on a link from Google, then Google will become a recent referer and will also receive a point as far as ranking is concerned. So if I get a lot of hits from Google, it may become the top referer. Why is this useful? Well, if you like my blog, maybe you can find other blogs you like by going to some that refer to mine. Plus it's neat for me to see where people are coming from.
Unfortunately, spammers have jumped all over this. They can hack their browsers, or write a program that pretends to be a browser, to visit my site with a bogus referring link. For example, if Spamford McGhee wanted people to come to his site selling the Billy Bass, he could make a program that makes it look like he's visiting paultastic.com from billybass.com. Make it run a couple hundred times a day, and voila! A link shows up in my blog to Spamford's site, because billybass.com has become the top referrer! Do this on enough websites, and the search engines will boost billybass.com's page rank because of all these sites which link back to it.
The blogging package I use, b2evo, has some nice anti-spam tools. For example, I can delete and block any reference to "billybass.com" on my site. Billybass will disappear forever from my blog and will never return. Even cooler yet, b2evo has a centralized database of blocked website keywords that the users share with each other. A guy in Rumania might not get spammed by billybass.com today, but when he updates his local list of spammers, he won't ever get spammed by that particular keyword.

Unfortunately, this method only works if I visit every day and prune my spam links regularly. I don't have time for this!
So Google's come up with a wonderful idea. It will ignore any links that have
rel="nofollow"
in them. What does this mean? Well, spammers will get no benefit from spamming your site from Google (and most likely the other search engines) They can spam all they want but what's the point?
Well, today I logged in and there were about 20 sites that were strangling out all the legitimate referrers to my site. I won't name them, as to promote their page rank, but it took me 10 minutes to clean them all out. As I write this, my web server is getting hammered from the little spam programs that are pretending to visit my site from these spammer domains. I was mad enough to do something about it.
First, I wanted to implement the "nofollow" feature, so my site will never benefit these spammers again. I visited here and here, which tell you how to edit b2evo to put in the "nofollow" attribute. It was only about 8 edits total, and now all the links on my site will not actually help the spammers...even if they slip through the cracks.
Going through my logs, I noticed that there were four IP addresses that accounted for the 20 spammer domain links I was getting today. So, after researching the problem (hosts.allow does not work for apache unless its compiled with tcp wrappers on FreeBSD) I found that the best way was to use mod_access.
I added the following lines to my httpd.conf file:
<Directory "/usr/local/www/paultastic4/blogs">
Order Allow,Deny
Allow from all
Deny from marketscore.com
Deny from block.alestra.net.mx
Deny from 207.127.0.2
Deny from 170.224.224.149
</Directory>
Those are the actual domains and IPs that were spamming me. Then I did an apache graceful restart, so it would honor the new settings:
$> apachectl graceful
And tada! The spammers won't even get to talk to my blog anymore. So if they're spamming with 20 more domains tomorrow, I don't even have to re-prune the new spam domains.
How to tell if its working
Well a great way is to put in the IP address of another computer for testing purposes. Go to that machine, and see if you can pull up a page from that domain.
Better yet, check your apache logs. Before:
host-148-244-150-58.block.alestra.net.mx - - [05/Mar/2005:12:25:30 -0600] "GET /blogs/index.php HTTP/1.0" 200 41930 "http://www.billybass.com" "Mozilla/4.0 (compatible; MSIE 4.0; Symbian OS/1.1.0)"
This guy requested /blogs/index.php and got a 200 code. That's the HTTP code meaning the request succeeded. The number after 200 is the number of bytes served up. The following field is the referring site (billybass.com!). After that is the browser the guy is using.
Here's what I see after I implemented the Apache httpd.conf fixes:
host-148-244-150-58.block.alestra.net.mx - - [20/Mar/2005:16:56:48 -0600] "GET /blogs/index.php HTTP/1.0" 403 288 "http://www.billybass.com/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.5) Gecko/20031007 Firebird/0.7"
As you can see, it's coming from the same IP address (this is an actual spammer address), 15 days later. The major difference now is that I'm serving up an HTTP 403 status code--which means forbidden! It's working. This guy cannot spam me anymore, which should cut down on the number of spam entries in my blog....hopefully.
I'm having more and more dudes spam me. Those four domains cut out a lot of spam hits, but they're still rolling in. For those of you running Linux/Unix/BSD, I came up with a few shell scripts to help automate this:
/usr/local/bin/paulsGrabSpamIPs:
#!/usr/local/bin/bash
grep $1 /usr/local/www/logs/paultastic_access.txt | awk {'print $1'} | sort | uniq >> /root/denyIPs.txt
NOTE: the "grep" line is all on one line.
/usr/local/bin/paulsGrabSpamIPs2:
#!/usr/local/bin/bash
cat /root/denyIPs.txt | awk '{print "Deny from " $1}' | sort | uniq > /root/denyIPsHttpdConf.txt
NOTE: the "cat" line is all on one line.
For example:
$> paulsGrabSpamIPs billybass.com
$> paulsGrabSpamIPs "-gambling-"
The first command will put all the IP addresses that referenced billybass.com into /root/denyIPs.txt. The second one will grab all the IP addresses and hosts that referenced the phrase "-gambling-" into /root/denyIPs.txt. Next we run
$> paulsGrabSpamIPs2
This command will take all of the past and current IP addresses into /root/denyIPsHttpdConf.txt. They will be sorted, with redundant IP addresses stripped out. In addition, each one will have the phrase "Deny from " typed in front of it so you only need to paste it into your httpd.conf file. It will look like this:
<Directory "/usr/local/www/paultastic4/blogs">
Order Allow,Deny
Allow from all
Deny from 134.75.217.55
Deny from 193.159.244.70
Deny from 160.109.67.43
Deny from 164.58.25.34
Deny from 170.224.224.149
Deny from 193.95.70.74
Deny from 195.229.241.184
Deny from 195.229.241.187
Deny from 196.203.64.2
Deny from 200-207-168-133.dsl.telesp.net.br
Deny from planetlab2.ru.is
Deny from pncccache4.palaunet.com
Deny from ppp-203.144.197.194.revip.asianet.co.th
Deny from tgvarna.pro-lan.net
Deny from webshield.sulanet.net
Deny from xdsl-264.lubin.dialog.net.pl
</Directory>
Of course, now all you have to do is an apachectl graceful to restart Apache gracefully. This will reload the configuration file so you can block all those stinkin' spammers. Good luck!
Syntax error on line 3755 of spammers.httpd.conf:
The specified IP address is invalid.
Syntax error on line 24090 of spammers.httpd.conf:
The specified IP address is invalid.This post has 3 feedbacks awaiting moderation...
I'll put my thoughts here. You can comment. We can all shoot lasers with our elbows.
| Mon | Tue | Wed | Thu | Fri | Sat | Sun |
|---|---|---|---|---|---|---|
| << < | > >> | |||||
| 1 | 2 | 3 | 4 | |||
| 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| 12 | 13 | 14 | 15 | 16 | 17 | 18 |
| 19 | 20 | 21 | 22 | 23 | 24 | 25 |
| 26 | 27 | 28 | 29 | 30 | 31 | |