Know Your Algorithms: Google, Yahoo, MSN, Ask.com, and Others
|
| |
![]() | |
Many people spend a lot of time analyzing search engine algorithms. Myself, and presumably others included. For that reason, I decided to throw up an entry describing each search engine algorithm, as I understand it. Any corrections or additions are welcome in the comments.
General
There’s only a few “real” search engines out there, with web crawlers continually scanning the internet. Off of these search engines, springs a series of duplicate search engines, returning identical, or mostly identical, results. The crawlers from the United States that still exist, and browse the internet are as follows: Google, Yahoo/Inkotomi, Ask.com, MSN/Live, and Gigablast.
- Ask Jeeves/Ask.com - There’s not a lot of information to be had about this search engine. And few people care.
- Ask Jeeves originally survived based on 2 things. First, staff manually found the answers to frequently asked questions. Secondary rankings were achieved by how many clicks each site got for the given search term. This was spammed out of existence.
- The second incarnation of ask, the current one, based off of the “Teoma” engine, later rebranded as “Expert Rank”.
- The original Teoma engine basically served to identify the topic of a page, and then examine the inbound/outbound links from that page, and determine if the topic was similar. This eventually creates “clusters” of information on a given category.
- This can be useful. It means that by becoming part of that “cluster”, you can gain rank. So find the sites that rank for your topic. Analyze their own links, that have similar content, and attempt to replicate it. After you’re established within that, create alternative sites on the same topic, and use that to expand that cluster to include your sites. Perhaps give a link to another site in the cluster, to more firmly ingrain that site in the cluster.
- Also bear in mind, this setup makes purchased links on sites not related to your own borderline useless. Be careful when buying/link spamming.
- Ask is a slow and selective crawler. To compensate for that(it gives them a borderline inability to get up to date/viral sites), they have a weight on “freshness” of content. RSS feeds, frequent updates, things of this nature may be useful. However, watch out for….the next bullet point.
- Ask, according to their own expert, examines “uniqueness” of content differently than most search engines. I can’t really speculate on that further, but consider it a heads up. If it was considered worth mentioning, it’s probably weighted rather severely.
- When deciding topic, it takes heavily into account keywords that are typically associated with eachother. So “smoking” and “cigarettes” are likely to stick together. Keep this idea in mind when deciding on keywords to put into your content. Use frequently used verbs for whatever noun you might be selling.
- The original Teoma engine basically served to identify the topic of a page, and then examine the inbound/outbound links from that page, and determine if the topic was similar. This eventually creates “clusters” of information on a given category.
- Yahoo/Inktomi/Altavista
- Altavista initially provided THEIR results to Yahoo. Of course, this obviously changed. Eventually, Altavista was bought out by Overture, which was then, in turn, bought out by Yahoo. Altavista and Yahoo are nearly identical to this day.
- Yahoo is Re-Vamping their algorithm as we speak, so this information is subject to change.
- Higher-Ups in Yahoo Pioneered the concept of “Spam Mass“. This is probably, in my opinion, one of the most important things to note. It(and my own experience) would indicate to us that it is probably a very important part of the algorithm.
- Spam Mass is a mathematical algorithm that is used to dynamically classify, by use of inbound/outbound links, if a page is attempting to artificially inflate rankings. This means, if you’re a link spammer, for the love of god, ensure you have at least somewhat unique links to spam. If they’re whored out, they’re probably worthless to Yahoo.
- Yahoo Appears to be less sensitive than Google for consistent anchor text. Variation is not needed.
- While many disagree with me, I contend two large points with Yahoo.
- Domain age/Time indexed is among the primary considerations. When I’ve purchased pre-existing domains, I’ve done infinitely better than with fresh domains.
- Yahoo hates me, specifically.
- Considering the algo is in the process of updating, I’m really not going to say too much more about it.
- MSN/Live- The Microsoft brain children possessing about 20% of the market. However, this is not to be underestimated, as this tends to be a 20% who is gullible, quick to click ads, and quick to purchase. Marketing to live is difficult. They’re still working out the kinks in the algorithm, so it’s unpredictable as all hell.
- They place a lot of emphasis on keywords within the domain. For example, “hoodia” top result is hoodia.com. “Viagra” is viagra.com. Mortgage is “mortgage.com”.
- From there, it appears to spread out to include almost exclusively places that have the key term in the domain, or in the file name.
- Something I noticed is that it appears to match keywords to url on a percentage calculation. For example, no one ranking for “mortgage” who didn’t have “mortgage” in their domain name, had anything longer thn “mortgage.html” for their file name. So no keyword stuffing in URLs if you want to rank msn (how-to-get-a-cheap-mortgage.html).
- This appears to be yet another search engine that emphasizes domain age.
- Once MSN/Live settles into a consistant algo, I’ll give this an update. But for right now, they move like a hallucinating crack whore: quickly, and in unpredictable directions. But oh well, maybe some day, “she” will straighten her life out(more about this wayyy in the future. It’s coming.)
- They place a lot of emphasis on keywords within the domain. For example, “hoodia” top result is hoodia.com. “Viagra” is viagra.com. Mortgage is “mortgage.com”.
- Google! - Who didn’t see this coming? I’ll cover the basics, but information on this is everywhere. Everywhere.
- Based on Incoming Links - Not just quantity, but quality is assessed. And actually, the quality of the other sites that the site linking to you linked to.
- Yes, they keep track of how quickly you get links. Consistent marketing, and consistent link adding is necessary if your site comes in with a particular “Bang!”. The line between fitting the profile of a viral site, and fitting the profile of a link spam site is a very very fine line.
- Put your keywords in your domain name. Barring that, put it in the filename. Not as heavily weighted as live.com, but still present.
- If you must buy links, do it in a low key fashion. No one that publicly advertises, and also try and get an agreement worked out about how many links they’re going to sell. Google is better at nailing bought links than they are link spam.
- If you’re going to interlink your sites, use different IP addresses. And preferably, different 255 blocks (x.x.x.0-x.x.x.255). Sounds crazy, but it matters.
- Put your primary keywords in the title. Put your secondary keywords in an h1 tag. Nuff said.
- PageRank spreads out the further you get from the base directory. Try and get your links on a page that is directly linked to by the external internet, not just internal links. Each page gets it’s own standing. By the same virtue however, at least one internal link is helpful. The more the better. “Virgin” pages (w/ no internal links) that DO have external links are generally dynamic, and Google probably knows that.
Hope this has been useful. Obviously, there’s a lot more information to be had. But honestly, I’m not typing out 200 pages for y’all. A lot of this is based off my own experience, and I believe it to be accurate. As always, any corrections, additions, or general comments are welcome.
All the best!





















December 21st, 2007 at 12:16 pm
[…] The Algorithm I’ve discussed this before, but I’ll rehash it […]
February 8th, 2008 at 12:11 pm
Ah cmon, Yahoo loves ( sop it with a biscuit love) exact keyword matches in the URL, content, and title tags. If that doesn’t work, kick start it with a very mimimal sponsored search. Let it runout and wait. It will come back like an ex looking for a booty call.
February 8th, 2008 at 12:19 pm
@Jim: Since this writing, I’ve officially made Yahoo my bitch.
Actually, when I fail at ranking a niche in Google properly(or the sites get banned), I mysteriously appear in Yahoo ranking quiet well. One of my favorite niches, I currently control 4 of the top 10 on Yahoo