• Home
  • About
  • Piqq.us Invite Feed
  • Links
  • RSS CULT
  • Why MSN Referrer Spam Will Never Work, and How to Defeat It

    Add to Mixx!

    Previous to this whole referer spam issue, I had little but respect for MSN/Live search. Search engine spam was much harder to rank with in their search algorithm. The largest weakness of it was that it fails to quickly list legitimate sites, and takes a long time to rank. And that’s acceptable. It is honestly a good search engine(for the most part). But this whole LIVSOP referrer spam tactic is a really brute-force tactic, for a problem that needs finesse. Given the chance, I guarantee I could make a cloaking detector that would work ;-). But hey, I don’t work there, so here we go.

    Why Won’t this Work?
    Long story short, it’s too recognizable, and caused too much uproar. Think about it like this. If you’re in a war, and there’s a sniper in an unknown position, you do not run as many troops as you can over to look for him. You use a sniper. Stealth is the best way to defeat stealth.
    Their method leaves so many footprints, it has no chance. Will it bust the current set of cloaking sites out there? Possibly. But everyone has heard of this now, and everyone is updating their cloakers. My own will be released within a week(complete software overhaul actually).

    So, How Can I defeat Their new Crawler?

    1. IP Based Detection - At present time, most Microsoft referral spam seems to come out of the 65.55.165.* block. I’ll scan my logs later for more. But a cloaking site can just add that IP range into their filters. Although, chances are it will eventually expand, so the other checks I’m going to list here are good to implement. If not now, eventually. Alternatively, just load up a list of all the MSN/Live/Microsoft IP ranges(and there are a LOT of them) and add those ranges to the bot list.
    2. LIVSOP Footprint - I don’t know about you, but I don’t think I’ve gotten any legitimate traffic that includes the word “LIVSOP” in their referrer string. You could just filter anything with that, and treat it as a bot.
    3. Rank Checking - For right now, they appear to be using only one word keywords. The vast majority of cloaking sites do not rank for many (if any) single word keywords, aside from perhaps the site name itself. So whenever you see a LIVE search, especially if it involves the LIVSOP header, have a quick curl script GO to the url, perhaps expanding the search results to include the top 50 or 100. If you’re not in the top 100 on whichever DC you’re checking on, chances are you’re not in the top 10 that would be required for a typical Live.com query. So if you don’t find your url in that search, voila. You found yourself a bot.
    4. Use the Uproar to Your Advantage(dumb idea, but would work) - Right now, people are pretty pissy about the whole referrer spam thing. So if you get a LIVSOP result, search for the IP, and “referrer spam microsoft” on Google. Chances are, someone’s bitched about that IP, and named it. So >1 result? Bot.
    5. Track The Incoming Live.com/LIVESOP Keywords - If one IP is mysteriously coming to you multiple times, with multiple single-word searches, chances are…you found a bot.
    6. Sneaky Links - Drop links on your site that no reasonable person would find. The “.” after “Privacy Policy”. Use it as a bot trap.

    What’s the Problem with This Technique They’re Using?

    1. It dirties up my logs, and makes my user tracking statistics(yes, I’m watching you ;-)) off. If it was standard referrer spam, whatever. That’s fine. My referring sites statistics are easy to debug, and not as aggressively checked as my SE statistics. But this is inserting fake search terms into my database, and I hate cleaning it.
    2. I have cloaking sites. For obvious reasons, this is a problem. But whatever, I can forgive this one.
    3. It fails to follow robots.txt. That includes fetching external javascript in restricted directories, images, everything.
    4. It’s a bandwidth leech. If I put up some big image(some I have on other sites are up to 1mb), then I don’t want them draining my bandwidth. I dissallowed it in robots.txt for a reason. If you’re not a real person, you’re sure as hell not going to click my ads. So don’t sap my bandwidth.
    5. I don’t get enough traffic from Live.com to make this crap worth my while. I’ve gotten VERY few clicks from live.com. We’re talking like MAYBE 3. I’ve gotten 150 different spam referrers from Microsoft. So 98% of my Live traffic, is their referrer spam. AKA 50 times the amount of proper traffic.
    Share and Enjoy(You know you want to): These icons link to social bookmarking sites where readers can share and discover new web pages.
    • Technorati
    • StumbleUpon
    • Reddit
    • PlugIM
    • Blue Dot
    • Bumpzee
    • Simpy
    • Netscape
    • del.icio.us
    • blogmarks
    • Spurl
    • Furl
    • Fark
    • TailRank
    • BlinkList
    • NewsVine

    4 Responses to “Why MSN Referrer Spam Will Never Work, and How to Defeat It”

    1. matthewk says:

      *EDITED TO PRESERVE THE STEALTH*

    2. admin says:

      MatthewK, can you drop me some contact info? I’ve got a little info on this, but not enough to write up the article yet I think. Maybe. But yeah, drop a comment with real IM/contact info, and I’ll kill it while it’s awaiting moderation.

    3. xHydra says:

      f^&k stealth its 2008 and we are still having the same problem. M$ Live search is krap.

      I am blocking them from all my sites until they reform their strategy. For every 100 spam referals i get one real visitor.

      Sorry for the language but I have had it with this blatant arogance from M$.

    4. Ross C Brown says:

      I’m in the process of adding up bandwidth used by the good folk over at MSN/Live.

      Gonna bill Microsoft for it.

    Leave a Reply

    XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

    Marketing & SEO Blogs - Blog Top Sites
    © Slightly Shady SEO, All Rights Reserved. Scrape me, and I will eat your soul.