Examining a Manual Review By Google
|
| |
![]() | |
Ok, so during the same log checking that I did yesterday, I came across a few manual reviews by Google. So I thought I’d share the love. I was surprised by a few things.
The Basics of a Manual Review
Not much is known about manual reviews(concretely). Manual reviews are Google’s awknowledgment that anything algorithmic will have flaws. Google is known to employ thousands of people across the globe for it. Some seem to be random, and some seem to be the result of a
certain “flag” or report that a site set off. The manual review I stumbled across today was one that appears to be centered around cloaking and [possibly] link spam. I was fortunate they were using a Google IP for this. Not sure why.
The [Partial] Log (Scrubbed for identifying information, of course)
Note: I was stupid, and blocked by individual IPs, not ranges. So these people DID get redirected. Damn. They’re old logs, so many of the things they are doing have been fixed.
| 76305 | 74.125.16.2 | USER | Affiliate Redirect | http://www.google.com/search?q=KEYWORDS+I+RANK+FOR+HERE |
| 80155 | 74.125.63.33 | Snooper | 404 | |
| 80555 | 74.125.16.2 | USER | Forced PPC Redirect | http://www.google.com/search?q=abc |
| 97396 | 74.125.16.2 | No Search Snooper | 404 | http://www.LINKSPAMSITE.org/index.php?db=freeboard&actKey=read&no=NOT_TELLING |
Examining The Log (Overview)
- The IP I’m basing this off of is 74.125.16.2. The other IP was brought in since it was the only other IP on that block of IPs that accessed any site I have.
- A Quick Whois lookup confirms that both IPs are indeed owned by Google.
- The Google search for ABC seems like a prequisite cloaking check. It shows up in many logs I have, but is not always followed by the subsequent visits.
- I elminated the times from the logs, but typically visits were 15 seconds to 1 minute apart.
Examining the Specific Attacks
- http://www.LINKSPAMSITE.org/index.php?db=freeboard&actKey=read&no=NOT_TELLING
Possible the result of a manual spam report done by the webmaster. Used as a prequisite to the various cloaking checks. Since this person got 404ed, I suspect that led them on to do the next tests. - http://www.google.com/search?q=abc
Probably an automatic check done for any site where cloaking is suspected. But I’ve had whitehat sites recieve this too. I’m thinking it’s done automatically on spam reported domains prior to review, and then kind of randomlly thrown around to sites in high-risk niches.
The concept behind this is that some cloaking bypasses it’s IP checks for anything with a search query(especially javascript based cloaking).
The second trick to this(the part that I fell for) is that it uses an IP not used for crawling, so it’s not on every bot list. It also has no reverse DNS entry, so it throws off reverse DNS checking cloakers. - No Referrer
This one doesn’t show up all the time. I think it may have been a fluke. Alternatively, it could be another check to see if the webmaster is filtering traffic to only allow people with Search engine referrers past. These showed up primarily from the second IP, included only due to it’s presence on the proper net block. Crawls from that IP were still not very common though, so I doubt it’s a standard crawler.
Why Do I Think That This is a Manual Review?
- The timing between the page loads was much slower than a bot.
- It used real keywords I ranked for in it’s fake search
- It used the referrer of a backlink that actually existed.
- While this could be automated, it would be very hard to separate the data usefully without a hand check
- The search?q=abc test occurs so frequently vs. the rest of them, it seems almost like a filtering mechanism for hand checks.
Is There Anything That Can Be Done About Manual Reviews?
In short, no. We can delay possibly. But it will somehow come back to bite us on the arse. That’s fine though. The manual checking is not scalable. As the net grows, it becomes less and less likely to continue on this scale.
Short term, we can filter the obvious things. Block a few who break the given rules of our site, or perhaps even e-mail alert us when a review is about to be handed down. It appears there’s still a significant amount of lag time between the review and the processing of the results, so it’s not too bad for now. But with as many reviewers as likely exist, there’s no practical way to identify them if they’re using outside IPs.
Well, maybe not. I’m working on a thing or two, but I doubt it will ever be efficient to the point where it’s useful.
Back to the logs then I suppose.
-XMCP





















February 4th, 2008 at 5:24 pm
I have been dealing with the abc query for a while. I am unsure if it is a manual review. I have had them hit 70 websites within a matter of minutes.
This is one of those things that makes me want to just take every Google IP and add them to my bot tables.
Manual or not I can’t see why one Google IP should see different info than another.
If it allows my sites to survive another week or two that is just bonus money.
The IP you reference above 74.125.16.2 is part of the Google range 74.125.0.0 - 74.125.255.255.
I had asked you in the finding stealth IPs part 2 area if you can think of a reason not to just add that whole range to your bot list.
I know it is over 60,000 IPs but…
Dennis
February 4th, 2008 at 6:00 pm
Excellent post as always, thanks for sharing!
February 5th, 2008 at 12:58 am
Why are the doing manual reviews in the first place? Are these third world workers doing this?
February 5th, 2008 at 12:54 pm
I have these IP’s from today, all hitting a new bunch of cloaked sites
72.14.195.217 - Mozilla/4.0
72.14.193.161 - Mozilla/4.0
72.14.193.67 - Mozilla/4.0
74.125.16.68 - Mozilla/4.0
66.249.85.83 - Mozilla/4.0
66.249.66.229 - Googlebot/2.1
66.249.70.237 - Googlebot/2.1
66.249.72.212 - Googlebot/2.1
66.249.84.67 - Mozilla/4.0
They’ve been hitting the same page at the same time it batches of 3
72.14.193.67
66.249.85.83
66.249.84.67
February 5th, 2008 at 12:55 pm
Sorry, should read in batches of 3
February 5th, 2008 at 1:23 pm
Hmmmm. I have a client that was having some serious landing page relevancy issues. I utilized a Google contact of mine to request a manual review and it worked quite well. I might be able to reproduce exactly which day that review was done on and then ask the client to pull the server logs. It was about a year ago though . . . would that be useful? In other words, would it be worth my effort and relationship equity to make such a request of my client?
Let me know . . . good in-depth article. I just added you to my iGoogle and Outlook 2007 RSS.
Brent D. Payne
February 5th, 2008 at 2:02 pm
74.125.16.68 just saw a site of mine, though it’s 100% clean. That said, I started up an ad campaign today, which has already generated 30K+ impressions, so it might be tied to that.
February 5th, 2008 at 2:02 pm
My site is clean I mean.
February 6th, 2008 at 12:35 am
@Dennis: I don’t add all of them to the same list, as I’d like to deal with unique, human visitors different than I would a bot. Not much sense in showing them random arse content.
@online marketing: They do reviews to check for people[like myself] who largely ignore their terms of service and do naughty things that their algorithms can’t catch/confirm on their own.
@Karl: Thanks a LOT! I’ll comb through my own logs as well for those.
@Brent: Glad to have you aboard! And I wouldn’t sweat it, unless you’re cloaking or generating large amounts of content. A manual review probably won’t impact landing page relevance so much.
@Gab: Any referrer attached to that log?
February 6th, 2008 at 6:54 am
Just to be sure, what you want to do is:
- show to the crawlers the black pages
- show to the reviewers/users the white pages
???
February 7th, 2008 at 12:52 pm
Excellent article on how Google works……I would enjoy reading future articles on this subject.
February 7th, 2008 at 10:43 pm
I am getting odd searches from IP 66.249.85.129 ff-in-f129.google.com a few days apart. Using: “www.google.com/search?hl=en&q=site%3Awww.mydomain keyword”
The keyword is different each time.
February 14th, 2008 at 9:08 am
We were paid a visit by 72.14.193.67 today. A simple google.com site:our.domain
We’re in a pretty competitive market though - I guess someone is unpleased with our ranking (it’s very good.)
Thanks for your article. Very insightful!
February 25th, 2008 at 10:50 pm
Hmm…or maybe were over-paranoid?
February 25th, 2008 at 11:08 pm
@altek: Nope. Blackhat sites are primarily nailed by manual review. And do you know of any Googlebots that use referrers?
September 8th, 2008 at 2:52 pm
Ok, most of this went over my head…but, I’ve been checking my logs lately and have had several hits by a GBot and today this:
No referring link
Host Name ff-in-f83.google.com
IP Address 66.249.85.83 [Label IP Address]
Country United States
Region California
City Mountain View
ISP Google Inc
Returning Visits 0
Visit Length 1 min 40 secs
VISITOR SYSTEM SPECS
Browser MSIE 6.0
Operating System Windows 2000
Resolution Unknown
Javascript Enabled
Navigation Path
Date Time WebPage
8th September 2008 12:20:27 PM No referring link
What does it mean?
Thanks!!!
September 11th, 2008 at 2:45 pm
More hits, averageing 3 a day now…
Moutain View
and Westwood
a search on the IP’s define them as being Google.
September 17th, 2008 at 6:31 pm
This may be a visit from the Google Accelerator prefetch proxy