True Experimentation: Finding Search Engine Rules On Your Own
|
| |
![]() | |
There’s a lot of misinformation out there. Some of it is recycled garbage and the author knows it. Some of it is misconceptions. Some of it is outdated. Everyone’s guilty of putting out some of it; as SEO’s we’re more or less shooting in the dark trying to figure out the intricacies of an algorithm that is fiercely protected and constantly updated.
Also (and this is important), there are occasionally glitches in the Google algorithm (In that case, nofollowed links passing anchor text power)
So sometimes we have to experiment, and throw SEO knowledge we previously hed dear out the window.
Why is Experimenting with the Search Engines So Difficult?
Any scientist will tell you a proper experiment requires that you have only one variable. Unfortunately, we cannot do this. You can’t directly clone sites to test, as the duplicate content algorithm will throw off your statistics. However, if you modify the content for each test site, you have added in a new variable, and the entire experiment starts to lose it’s reliability. So how can we go about testing the search engines without a sufficient amount of control?
What Variables Can we Control?
- Anchor text
- Template Design
- Internal/Outbound Links
- Number of Pages
- Link Velocity - Link Velocity is how quickly your site gains links, and how many appear each day. It’s best to try and achieve a upward slope, followed by a downward slope
Variables we Can Tentatively Control
- When are Where the Links to our Various Sites Appear - We can’t determine where our links will stick, and their internal standings. But by being careful about where we try and gather our links from, we can gain a degree of control.
- Content/Keyword Density - We’ll cover this more in depth later, so we can get similar content that should rank the same, without setting off dupe content filters. And by the way, I know keyword density is a crock of shit in it’s normal context, you’ll see what I mean later.
- Domain Names and File Names -Obviously, we can’t have multiple sites using the same domain and expect them to evaluate properly.
- How Quickly Links are Discovered - Even if we drop links in the same locations, we have trouble getting Google to find them at the same time. However, we can help this
Variables we Cannot Control
- Google’s magic dance that no one really understands (err the unknown elements of the algorithm)
How to Control our Variables the Best We Can- How well we can control these variables will make or break our experiment. In reality, we control the majority of the variables in some fashion, but without proper control on these, the data is worthless.
- When are Where the Links to our Various Sites Appear
- Unless you are testing the effects of different links on your site, this is a doozy. You may have to sacrafice backlink quality for this control. It’s important that any link created is easily replicatable, and that we attempt to control the number of internal links linking to the page with our backlink.
- Even things like digg are not perfect for this, as the articles showing up on different “upcoming” pages can impact how much juice gets passed to our specific backlink page. However, by using sites like digg/propeller for the experiment, we gain a large benefit. They have SO many outbound links on any given page, they pass the same amount of juice to pretty much any page(not very much). By submitting using the same tags/category/user, this variable becomes much more controlled.
- If you’re only testing 1 or 2 sites, you can get away with directory submissions, but you lose a bit of reliability. Submitting 2 nearly identical sites to the same directory in the same day will almost assuredly get one link accepted, and one denied. So a 1-2 day gap is needed. This affects our domain age/backlink age in Google’s eyes, and throws off the results. It’s necessary in some cases though.
- Content/Keyword Density - Probably the most important and overlooked variable in experiments like this
- Make a list of any and all acceptable uses of your keyphrases, or anything topic-specific. Especially verbs. Note how many times you’ll use them per page, and overall content length.
For example, let’s say you use the line “We provide SEO services to businesses in the Detroit/Metro area, and throughout Northern Ohio.” The words underlined are the words that must appear together, and the exact same number of times. The words in italics are the words that need to stay the same relative distance from these underlined phrases. Within a sentence or two. In addition, we try and keep all of this in the same order.
So if we were to write a permutation of this sentence, it might be: “Only one group provides SEO services to businesses all around the Detroit-Metro area, and in the are of Northern Ohio.”
By using somewhat generic phrasing, we can dodge the duplicate content filters, while keeping our important key phrases in the proper amounts, areas, and orders without setting off the duplicate content filters.
- Make a list of any and all acceptable uses of your keyphrases, or anything topic-specific. Especially verbs. Note how many times you’ll use them per page, and overall content length.
- Domain Names and File Names
- Perhaps filenames shouldn’t be listed here, as we do control them completely. But domain names? Yeah. Launching multiple sites on one domain(via subdomains) is unacceptable, as no one is 100% sure domains vs. subdomains are handled. So we need domains that will provide no benefit or penalty. Try just gibberish letters on the same TLD when you’re testing a new variable. Nowadays, having a keyword in the domain provides you with a significant bump, so we can’t take that risk. In reality, avoiding vowels might be a good idea, just to avoid accidentally containing a small word someone might search for.
- How Quickly Links are Discovered
- Obviously, we need to submit these links at the same time to the same places.
- If you’re submitting to social news sites, make sure you use the same tag, categories, and user names. First and foremost because different tags hold different PR(link juice). Secondly, because when the search engine crawls those categories/tags/user profile, it will see all the links at once.
- If you know of links that you can’t control in the method above, consider submitting them directly to Google, or pinging the URL if they have RSS.
- Template Design - Oh yes. There’s more here than you think
- Google handles different link locations differently. A footer link for example, in theory is not weighted as heavily as one in the header, or a link that (somehow) appears in the content part of every page on a site. So when designing your templates, make sure that the links to the various locations are in the same areas.
- Same link frequency: Internal Link Juice is a complex and interesting beast. It is absolutely imperitive that each site has the same basic structure, with the same number of internal links to each page, from the same locations. With the same anchor text. There’s really no way to avoid this.
- Avoid altering how you display menu. Complicated menus such as Javascript, or drop-down CSS may set off Google filters about hidden CSS text(aka ghetto cloaking), or may not get crawled at all. Static HTML is the friend of experimentation.
- Does your template have titles/subtitles? Check to see if they use h1/h2. If they do, you’re going to want to make sure all the templates you use do.
- Number of Pages
- Google judges sites based on number of pages. Odd as that sounds. Ensure they’re the same, unless this is what you’re testing.
- One important note: Just because you create 3 pages of content does not mean that there IS 3 pages of content. If you’re using wordpress, there’s many, many more. Remember that each tag, each category, and each entry in the archives creates a new page that could offset your careful balance.
Remember. What separates a real SEO from the stumble-exchanging circle jerk that infests our lower level is the desire to try something new, and to figure things out on your own. It’s also what moves this industry forwards. With that said….get ranking!
Regards,
XMCP
PS: If you missed it yesterday while you were with your family or other craziness, we still have a lot of Yahoo Directory Listed Domains that are up for grabs. And this time I fixed it so it’s harder to just copy/paste the whole list into a bulk-registerer and swipe em all up. Spread the love, and share it with your friends.





















December 26th, 2007 at 11:09 am
How to Design a Proper Search Engine Experiment: Finding Google’s Rules on Your Own!…
In todays world, search engines maintain a high level of secrecy. Having such a lack of control also makes it incredibly hard to design a proper experiment. However, we have more control than we give ourselves credit for, and a properly designed experi…
December 26th, 2007 at 1:58 pm
Note how the sandbox throws a monkey wrench in all this.
When you’re playing the game of “throw up pages quickly and get them ranked for long tail queries”, you can see results in a few days and quickly use feedback to improve your process.
When you’re trying to rank for competitive queries, you might need to wait for a year to rank — you don’t need to optimize against today’s Google algorithms, but rather optimize against what Google will be running in a year.
They get to design an algorithm that gets good results on the web sites we were making a year ago — no wonder why they’re rolling in the $$.
December 26th, 2007 at 5:45 pm
Heh this is a good point. But remember, the sandbox doesn’t ALWAYS show up. A lot of times you can skip it. But yes, interaction with the sandbox does torch our luck. I never thought of that perspective on why the sandbox exists though. It makes sense. A lot of sense.
December 27th, 2007 at 4:54 am
You are fucking incredible. How old are you and what college do you got to? Probably some of the best article I’ve ever read. Dont even closely compare to Eli’s, which for the most part were half-bullshit. Very nice.
I’m not referring to this post, but your other articles in general.
How accurate is the link structure you discussed for blackhat sites (based on experimental design or theory?, ie main page links to 100 backlinked subpages, etc etc).
December 27th, 2007 at 8:20 am
Thank you very much! I’m 19, and still an undergrad at Michigan State.
The link structure I talked about before is actually my current setup, and appears to work excellent. Everything is open to tweaks of course, but it seems to work quite well. I need to mess with the dynamic linking part still(this code is kind of rapidly expanding), but the tests thus far have gone incredible.
Glad to hear from ya!
December 27th, 2007 at 3:22 pm
@Vaqif — Not to diminish XMCP’s contribution, but don’t go badmouthing Eli — he’s not a bullshitter.
True, many people ‘try’ ideas from Eli’s blog and don’t get results as good as Eli claims. For instance, consider the original Black Hole SEO article. Many people have tried following his instructions and gotten no links at all — that’s because Eli didn’t spell out exactly where the scrapers are looking and where they are looking for.
On the other hand, if you’re doing a vigorous job of promoting your sites in various channels, you’re most likely getting scraper links.
Let’s consider a little example where names have been changed to protect the guilty.
Suppose you post a news story to a social bookmarking site like Mixx that has the word “flying” in the title. You might find there is a site with a name like “flying.my-splog-empire.mx” that sucks up the text of the Mixx post and stuffs it into a stock wordpress install — didn’t even change the template. He’s even kind enough to send you a trackback so you know that he did it… Too bad he gets rejected by Askimet.
A little investigation shows that he has at least 100 of these things running on the same IP address.
At the moment I’m happy to get a little “force multiplier” from him — I can’t complain about getting two or three links for the effort of one.
If, on the other hand, you do a little investigation, you could certainly find a situation where you could feed a scraper enough content to generate hundreds of links an hour… Like Eli says. But you gotta think a bit and work a bit.
A lot of Eli’s stuff is like that — once you start thinking about it, and using it, you discover deeper and deeper truths.
The other day I came across a Sucka who started a porn affiliate site with Wordpress, did one referrer spam to get it a backlink from a .gov site, and he thinks he’s Hugh Hefner… Suckas like that aren’t going to make $50 a day following instructions from ANY blog — there’s a reason why they are poor and “The Rich Jerk” has all the money.
December 27th, 2007 at 4:59 pm
Alright so its not mod_rewrite. You know I feel like shit for asking. In fact don’t answer, nvm.
January 2nd, 2008 at 4:24 pm
[…] of a computer after a long break, staring at the screen trying to figure out what I’m doing.
January 4th, 2008 at 2:25 am
[…] True Experimentation: Finding Search Engine Rules On Your Own […]
January 29th, 2008 at 4:35 pm
[…] have thoroughly enjoyed SlightlyShadySEO’s post about designing SEO experiments. If you haven’t read it yet, go do that now, it gives quite a comprehensive list of all the […]
March 23rd, 2008 at 7:00 pm
[…] is all over the place lately. Firstly there is the excellent post by our old favorite XMCP about all the things to keep in mind when setting up an SEO test (and my follow up post). Then there is an interesting test about Google preference for different […]