• Home
  • About
  • Piqq.us Invite Feed
  • Links
  • RSS CULT
  • Are We Giving Google too Much Credit?

    Add to Mixx!

    Ok. This is a debate that’s been raging in my mind for a few days. Google seemingly has almost multiple personalities and abilities. On the one hand, they are completely naive. On the other hand, they are capable of bringing down massive networks in a matter of seconds. So how much credit should they be getting here?
    Note: This is obviously not everything they use. There is no mention of spammy backlinks, etc. So this is more going after their detection of BH on-site methods.

    What Caused Me To Wonder
    The wondering actually started when I read the Google quality raters guidelines. I was completely stunned as to how basic a lot of the things were they were going after. How simple they would be to prevent. And yet Google, with it’s billions of dollars in capital, was struggling with these?

    How I Believe it Works
    I believe their automated “spam” identification serves essential as an automatic version of a manual report. They have to have their employees look to see if something is actually breaking TOS! For some things, alright, I understand. For many though, this is insane. It’s not scalable, it’s not technologically adept, and quite frankly it is easily identified. It sounds like they’re not even giving their reviewers a starting point for what it was flagged for…

    Some Choice Excerpts

    1. Recognizing Hidden Text and Hidden Links
      Apply Ctrl-A (the keyboard shortcut for Select All) to the page and then scroll through it. This technique
      may expose text or links that are hidden from the human eye.

      With both of these examples, you should apply Ctrl-A to the page and scroll down on the page.
      Really? Google has trouble identifying people using the same background color as their font color? Here’s a hint. If they’re using a background image, run through it and find the median and average color. Compare that to the font color. MAGIC. They’re using the mozilla rendering engine, so this REALLY isn’t that hard. It shouldn’t even require source code examination for the most part.
    2. Minute text is not always exposed using Ctrl-A. Be suspicious of horizontal lines or bars on the page.
      Sometimes they contain hidden text. Use the techniques above to check for it.
      Yet again. Issues with fonts. Unable to detect a font size? Bear in mind, them even including this means that Google probably[initially] considers size like 2px font size as still significant.
    3. If you suspect a Sneaky Redirect has taken place, you should check “who is” the registrant (or owner) of
      the two domains. If the registrant is the same, the redirect is not sneaky.
      Hear that ladies and gents? That’s the roflcopter right there. I can register a domain in Matt Cutt’s name if I want, and have no issues. I always assumed there’d be much more to this.
    4. Recognizing Scraped Content
      You can copy a snippet of text (a sentence or part of a sentence) and paste it in the search box to see if
      you can find its source. You will sometimes discover that the text was copied from Wikipedia or one of the
      other sites mentioned above, or you may find that the text exists on many, many web pages.
      Behold. Everyone has long worried about the notorious “dupe content” filters leading to them being flagged as spam and insta-banned. Apparently, Google uses the same techniques as your average vigilante whitehat for detecting spam.
    5. The list goes on and on. I don’t feel like re-reading the entire thing to find some. But epic failures at detecting 100% iframes, affiliate links, and other such lovelies make me really question their true ability at detection.

    Google Has a Double Edged Sword
    Google has a bit of an issue here. It gives blackhats and others 2 different chances to fool them. You either fool the bot(or the person submitting the spam report) or you fool the person reviewing your site. While this is necessary for some things in order to prevent sites from getting banned just because they have a quirky CSS style(though I hold that much of this should be detected automatically, and without question), in doing so we have 2 opportunities to fool Google. It seems to me that they have long considered their manual reviews flawless. Or close to it. Those days may be ending soon. Understanding Google has a huge reliance on these human reviewers is a big weakness. Humans are easy to fool.

    (Insert Evil Blackhat Laughter Here)
    Manual reviewers are easier than one would think to detect. Or at least my initial research is showing that. It’s the first time I’ve ever been anxious for my next domain to get banned. It’s not perfect yet(not by a long shot) but it’s decent.
    I think I’ve caught one, but have yet to verify that for sure. Also, I need to work on a way to show a WH page when I find one. For now, I just rickrolled one. Heh. I hope that wasn’t a potential customer instead.

    XMCP out.

    PS: I’m on twitter now. I gave in to the cult. Add me if you’d like. But please, not just to ask a question or get advice. Use twitter how it was intended :) Otherwise, I’ll just stop using it or something.
    PPS: A new SEO experiment will begin soon. My rankings have stabilized, as has my readers and my traffic. So soon SlightlyShadySEO will be….SEOed. Heh a long time coming, wasn’t it?

    Share and Enjoy(You know you want to): These icons link to social bookmarking sites where readers can share and discover new web pages.
    • Technorati
    • StumbleUpon
    • Reddit
    • PlugIM
    • Blue Dot
    • Bumpzee
    • Simpy
    • Netscape
    • del.icio.us
    • blogmarks
    • Spurl
    • Furl
    • Fark
    • TailRank
    • BlinkList
    • NewsVine

    10 Responses to “Are We Giving Google too Much Credit?”

    1. Required says:

      One question though: what makes you think that pdf is actually the real “Google quality raters guidelines”?

    2. admin says:

      1)No benefit to anyone for faking them on their own.
      2)Totally not Google’s style to fake such things.
      3)A tiny verification I did on a single fact. But a very important one.

    3. Cure Dream says:

      Google represents the new trend in “artificial intelligence” — you might call it “artificial stupidity”; use parallel processing to apply simple algorithms to large amounts of data. The more data you’ve got, the easier it is to seem intelligent.

      The crux of manual evaluation is “is this page that a searcher wants to find?” and not “does this page use dishonest technique X?” Many reputable sites use questionable techniques because their competitors do too.

      It might not seem scalable, but Google and MSN currently have the resources to do a lot of manual reviews. Certainly they investigate reports of spam sites quickly and deal with them harshly.

    4. TheMadHat says:

      @Req I got the same document from 2 independent sources as well.

      @SS you mean I no longer get to link to ?p=244 pages anymore? Damn, you’re making my life too easy.

      My favorite in that doc is “If you think it might be spam, label it as spam”. Basically our future online presence relies on some nutjob making 12 cents an hour who might or might not even understand the subject of the page in question.

    5. Bogdan says:

      I have just browsed through that pdf you’re talking about, but it’s hard to believe it’s something official.

      “If you think it might be spam, label it as spam”. Probably in the future, this will filter out the spam only for the labeler.

      Shady, consider yourself followed.

    6. Doubtful says:

      This document seems very shady to me. If this is from the mouth of Google, why can’t you find it when you search for “Google quality raters guidelines”? Why is it hosted on mauriziopetrone.com, sourced on vizualbod.com and cannot be found on said website. And the biggest question, why isn’t at least one Google logo on this “Google” document. Some things to think about.

    7. admin says:

      @Doubtful: It’s a LEAK. So it was not intended to be released. Hence the no branding, and being hosted elsewhere.

    8. Gab "SEO ROI" Goldenberg says:

      Shady, I think you’re BSing us. When I searched for “Google secret sauce” and “Matt Cutts’ secret spam verification tools” nothing remotely like this showed up. Pffft, who do you think you’re fooling?

      More seriously, my call is that after a document got leaked in 2005, they take care not to give reviewers (who are contractors, as I understand it, not regular employees… and even if they were employees, they’d be low-level) sensitive stuff. Google seems more intelligent - because they are. There’s more advanced stuff going on, imho, and the rank-n-file don’t have access. The concept of risk management. Same reason Matt Cutts doesn’t have access to the whole algo (or so he claims).

    9. gInsider says:

      Google quality raters guidelines is fake, because it is not from google, but from a smaller company.

    10. Google Search Sucks says:

      Just want to add my two cents. First one of my co-workers was a Google employee who did manual review, and he is a bright guy. His job was to find “new” spam techniques. Second… I found this strange manual review posting on CraigsList

    Leave a Reply

    XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

    Marketing & SEO Blogs - Blog Top Sites
    © Slightly Shady SEO, All Rights Reserved. Scrape me, and I will eat your soul.