Google Analytics & "Spam Data" - Removing Automated & Ghost Referrals From Your Analytics Reports
Seeing a sudden, unexpected and potentially dramatic rise in your traffic or other unexplained changes in patterns of traffic or engagement within Google Analytics? Even if nothing appears untoward we'd advise you to check your referrals and hostname reports to see whether you've been affected by either spam or ghost referrals.
Referrers such as semalt.com, ranksonic.info, social-buttons.com or buttons-for-website (and many others) can generate large numbers of referral sessions within your Google Analytics reports and seriously interfere with your traffic figures and overall engagement statistics. You may well find traffic from these and similar sites within the All Referrals report (which appears under Acquisition / All Traffic) - regardless of whether these are services you ever signed up for.
Sometimes you may see fairly dramatic spikes in your traffic - alerting you to a problem. In many cases however the volume of sessions generated may stay low enough to be below the radar and only identified after some investigation. Whilst low though the aggregate affect of multiple spam referrals can add up. Small businesses / sites need to beware - whilst an increase of a few hundred sessions may be a tiny blip for a large site with tens of thousands of visits a month if your site has a lower volume of traffic spam referrals can seriously distort your figures.
Within the All Referrals report look out for unusual patterns of traffic from specific referring sites. You may sometimes see a large spike in traffic from a site occurring on a single day with nothing generated thereafter for some time. Look out also for referrals generating a very high percentage of new sessions (often 100%) often with a bounce rate of or approaching 100% (although this is not always the case). Any referrals matching these criteria warrant further investigation (although be careful and don't automatically visit the suspect site - do some research via a search engine first).
You should also check your event tracking reports (available within Behaviour / Top Events) as we've recently encountered spam data appearing here (via ghost referrals). Event tracking has to be specifically setup for your website and is not available by default so if there's something you don't recognise then check with your web team.
Automated Spam & Bot Filtering
Google rolled out a bot and spider filtering tool in 2014. You configure this at a view settings level (via a tick box) and need to enable this for each view you wish it to apply to.
Spam traffic is currently a growing problem within Analytics. Selecting "remove bots and filters" within your view settings however still allows some automated referrals to make their way through to your reports and so you'll need to take action e.g. via setting up exclusion filters to remove them or taking one of the steps outlined below.
It's recommended that you also check your "Audience / Technology / Network" report and then change the "primary dimension" (link below the graph) to "Hostname" to check for anything suspicious - the hostname is the domain from which your content was viewed or tracking was triggered.
If you see unexpected results here (e.g. darodar.com) these may be from ghost referrals - this traffic never actually visited your site but tracking was triggered via the Google Analytics "Measurement Protocol". An include filter (telling Google to only include your domain and other relevant domains) can help limit such ghost referrals but for a more comprehensive (but more technical) solution LunaMetrics have described a technique using a cookie and Custom dimension.
If adding a filter as always ensure you have a master view with no filters applied to ensure that if you make a mistake no data is lost. You might also wish to create a test view and apply the filter to it before rolling it out to your main views.
As the number of sites generating automated referrals and ghost referrals appears to be growing (we've seen a big increase over the last few months) and site sources change you'll also need to be proactive about this. Having performed an initial audit to identify and exclude any suspicious referrals from your reports you may then want to set up some alerts within the Intelligence Events section of your primary analytics view.
What if we have already been affected?
If you are preparing reports based upon historic data that includes referral spam you'll need to use advanced segments to filter this data out. Identify the hostnames and referrals you wish to exclude and then create an advanced filter to exclude traffic by source / hostname. (Note if there are a lot of suspect hostnames you may instead want to use an include statement).
Ultimately this is a problem that Google will need to (and we are confident will) address or using Analytics is going to become a lot more problematic - there is a danger though that the problem could get worse (and potentially more complex) in the meantime. For now we strongly advise that GA users are proactive about identifying whether they are affected and take steps to ensure the integrity of their data and their reporting.