MZL & Novatech TrafficStatistic Website
Home MZL
Webshop
Products
Webservice
Article: Finding a webhoster avoiding pitfalls
Article: Handling referer misuse
Article: Impact of the gdiplus.dll JPEG vulnerability
Submission site reviews
Web Directory
Latest Entries
Authors
Helpdesk
Feedback
About us
News
Search
MZL & Novatech TrafficStatistic Website
News - more news - submit news - XML
Union City/USA, 05/17/08:  (details)
Start your Career as a Security Professional with uCertify's CISSP PrepKit
USA, 05/16/08:  (details)
Spam Blocker SB 1.0.0.1
USA, 05/16/08:  (details)
Total Security Premium 3.0.0
USA, 05/16/08:  (details)
Total Security Basic 3.0.0
USA, 05/16/08:  (details)
Omniquad AntiVirus TS 3.0.0/AV 9.0

Article: How to protect your web site against referer spam

According to our own logs and
recent reports in blogs referrer misuse increases. TrafficStatistic News has already reported about referral spam several times. This article will give some practical tips for webmasters to protect themselves against being fooled by referral spam.

Content

How referral spam works

When a browser requests a website, it transmits not only the url to fetch, but also some more information in the request header. Have a look at yours:

GET /articles/referrer_spam_protection.html HTTP/1.1
Accept: Accept: application/xhtml+xml,text/html;q=0.9,text/plain;
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Accept-Encoding: gzip
Accept-Language: en-us,en;q=0.5
Cache-Control: no-cache
Connection: close
Host: www.trafficstatistic.com
Pragma: no-cache
User-Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)

You might have sent a referer header line similar to this like about 50% of our visitors do:

Referer: http://www.trafficstatistic.com/pages/article_webhosting_industry.html

This Referer line tells us where you came from, so it can be assumed that on this site is a link to this page. The referer line will go into the log of our webserver and a web site log analysis software can now pick up this referer and create a statistic about all the pages where our visitors came from. This statistic is usually displayed as an HTML page rendering the referer as link and some people even make it public. A webmaster or when it is made public any visitor then might be interested in a new page that linked to us and visits this site to look what's the site looking like linking to our site.

In effect the site which was mentioned in the referer line received a visit from our admin and if it is made public, possibly also visits from public and a better ranking from search engine bots because link popularity is increased due to the link from the referer list.

That is how referral spam works, people just request a web site and send a referer with an URI to their own page in the header. This process can be easily automatized by using a list of web sites to call and request the sites with a program or a script. So the spammer can request millions of web sites with just one call of the program, receive better link popularity in search engines and visits from people following the links in the referer list. Recently there appeared a
Windows based tool for mass generation of referral spam, so it can be expected that referral spam will increase in the near future to a so far unknown high level. Update: Elliott Back from Cornell University wrote a Free Referer Spamming Reffy Clone based on .NET, so the Jinnee came out of the bottle. Referral spam is now free for all the Windows dummies around the globe.

Technics usable for referral spamming

Principally there could be used any web technic able to open a TCP connection to port 80 to do referral spam, but there are some practical limitations. In most browsers there is still no configuration option to set the referer manually before requesting a website. That might change quickly, in Internet Explorer there seems to be an option to prevent referers from being sent at all, which might be used by unfriendly surfers to propagate their own websites when surfing other peoples sites. Regarding Mozilla it would be possible to compile a custom build from the publicly available sources with a customized special referer. But so far I haven't met any automated procedure using browsers spamming our referers, except some people using proxies while really browsing to misuse referer headers. So there mostly remain programs and scripts in Perl, Php or any other language able to open TCP conections. Most programs created using such technics won't load any images found on the webpage and all these technics have in common that they cannot interpret Javascript embedded in a web page, because interpreting Javascript is a really tough programming task. Also programs requesting many web sites need a good internet connection, which will be usually a server having a fixed IP. Another limitation is, that I didn't meet any spamming tool able to generate a link to the spammed page in the site which is advertized in the referer. This problem will probably hardest to overcome for referal misuse. Calling a million websites with a false referer would mean, the referral spammer had to create a million links on the website he is doing the advertizing for, though even that limit might be possible to be broken by specialized prepared pages using technics like HTML metatag redirects.

Using the limitations in client technics to prevent spamming

There is no way not to deliver a website to a referral spammer except when you know his IP address in advance, because there is no practical way to detect referral spam when a page is delivered. But there can be done a lot to make referral spamming useless or even harming for the spammer.

Blocking IPs from well known referral spammers

This option is easily possible when you are able to configure your firewall, for example when you have a root account on a dedicated Linux server. TrafficStatistic uses for example the following line to block requests from www.adressendeutschland.de:

# www.adressendeutschland.de
iptables -A INPUT -p tcp -s 213.239.194.170 -j DROP


Advantage of such blocking is that it also prevents harvesting email addresses from our web sites and does not cause any server load or traffic for our machine. Disadvatages are that the IP might be transfered to somebody else in future without that we are knowing about it, so we would block somebody we don't want to block and that the block will not work any more when the bot is coming from another IP.

Building referer statistics based on beacon images

This is the core in preventing referral spam. TrafficStatistic counts and evaluates referers based on 1 pixel images with specially prepared dynamic URI, which are loaded by the client.

/entry/stats.php?page=/articles/referrer_spam_protection.html
&space=news
&pos=4
&width=250
&height=250
&type=google
&merchant=
&product=channel_1
&net=google
&jsversion=1.5
&framed=no
&javad=yes
&colordpt=32
&screenwidth=1280
&screenheight=1024


/entry/stats.php is a php script just returning a transparent 1 pixel gif image. To make it not disturbing the web site layout at all, it is loaded as a background image of a table. All page request referers not loading these images are not evaluated as page hit referrers. In fact we do evaluate page hits not loading such images as bot hits using logfile analysis, so we will track hits without loading images anyway, but diffrent.

Javascript referers

TrafficStatistic combines beacon images with javascript.

<script type="text/javascript"><!-- document.write('<img width="1" height="1" src="/entry/stats.php?page=/articles/referrer_spam_protection.html
&space=news
&pos=4
&width=250
&height=250
&type=google
&merchant=
&product=channel_1
&net=google' + jsstats + '" style="position:absolute;top=0px;left=0px">'); //--></script>


This is a very efficient method to be sure that a human loaded the page we are building referrer statistics for. We haven't remarked any bot being able to pretend that it's human. The mistake in stats for sorting out humans with clients not being capable for javascript or loading images is marginal. We very rarely find any referer in our referral stats, which doesn't have a link to us. Using javascript has also the advantage, that you get information about your visitors like screenwidth and screenheight, which you can't get without client scripting like javascript.

Do not show your web site stats to the public

We at Traffic Statistics believe that web site traffic statistics and details are very private data, so there is no idea about publishing them. If a media kit is needed to prepare, that can be made public using aggregated data, to proof it to potential advertizing customers we can decide to open access to our raw statistics data for them when we are negotiating with them. When referers are published as links it's like an invitation for a spammer to get a link to his site, therefore a better google ranking and fresh traffic. This can be dramatically reduced by not linking to the referral pages in a referral list.

Checking back your link on the referring page

If you have a scripting enabled web site, this is a really nice way to make the spammers angry and to assure yourself you are listing only referrers in your referrer list having links to you. A check back would have the disadvantage, that your site would have to bear the additional traffic caused by the check back, but the advantage, that you can be sure, a referrer you display in your referrer stats has really a link to you. If many webmasters would do it, it would have fastal consequences for the referral spammer: when he spams a million of websites with referral spam, then his website would get a lot of traffic to check back the referer, so his traffic bill would increase dramatically due to his spamming.

Though, it is not as simple as it seems to be, just inserting a few php lines into your page like:

if($_SERVER['HTTP_REFERER']
  && preg_match("/".$_SERVER['SERVER_NAME']."/",implode(" ",file($_SERVER['HTTP_REFERER'])))) {
  $referer=$_SERVER['HTTP_REFERER'];
} else {
  $referer="";
}


wouldn't be very clever, since your page then would be always displayed after you loaded the referring page. However, this script snippet might give an idea to you, how you could be able to verify your referrers using a check back in a separate process.

Bloggers: Take only pingback and trackback referers into your referral link list

Blogging is almost a synonym for automation of checking back the referral links. The typical technics used in blogs like pingback and trackback do verify the link on the referring page, XMLRPC or trackback is also used to announce the availability of a new referring link to each other. So it might be rather easy to feed the referers from web server logs into the pingback interface to verify the existence of an inlink before displaying it in a publicly available referer link list. Or just don't take any referers into your link list which were not provided by ping back or track back. Couldn't be a big deal to do this.

Other resources related to referer spam

  • Proposal for a solution to referrer spam: Using MT-Blacklist and other blacklists to filter spamming URLs

    Date: September, 7th 2004, Last updated: February, 2nd, 2005

    Author: Marcel Bartels, http://www.trafficstatistic.com

    Editing and/or reproduction of this article is permitted under the condition that a reference to this original article at http://www.trafficstatistic.com/articles/referrer_spam_protection.html is given.
  • Impressum
    © 2004-2005 MZL Billing Services & Novatech Ltd. All rights reserved.
    Sponsoring Mein Parteibuch