Alright guys. This is not a list of search engine ranking factors. Think of it like a list of potential factors and how the web looks from an overhead view; how the algorithm’s can see it. Things here may not affect rankings today, or they may. But even those that don’t, look to them being integrated soon. This is not search engine specific. Just possibilities.
Whenever there’s a big shift in the search results, you need to know what the search engines see, and what they could possibly be changing, to try and find the source of your troubles or success. Your own list of potential factors may be different from mine (mine is heavily biased towards the type of sites I build), but whatever. Figured I’d put it out there anyways.
- Site Ownership/Identification
- Hosting/IPs – Websites obviously need to be hosted. It’s one of the things that restricts the building of truly solid link farms. Each IP has a few identifiable pieces of information. It’s C-Class block (0.0.0.0-0.0.0.255), the company that IP block is registered to, and obviously the IP itself.
- Registrar/WhoIs – No, they can’t access private whois information, even though Google is a registrar. Either way though, private whois will not help you out exactly on reinclusion requests. We also know for sure this information is archived to an extent(hence issues with domain flipping that have arisen lately). In terms of connecting sites together, registrar can be significant. For example, this blog uses 1and1(don’t get me started) private registration. Let’s say I have 2 sites on hostgator. With hundreds of registrars, and 255^4 IPs, statistically what do you think are the chances of 2 domains with private whois residing within 5 IPs of eachother linking to eachother? Astronomical. This isn’t built into the algorithm yet as far as I know, but keep it in mind.
- Linking by IP – The chances of 2 IPs linking back and forth to eachother from different domains frequently is extraordinarily low unless the webmaster owns both servers. Once again, not known if this is a factor yet.
- OutBound Linking
- Outbound Linking and Niche “base” Sites
This is the theory that fixed Ask.com’s search engine. Look at the SEO sphere. There’s a distinct circle of sites that link between eachother. You’ll see links going back and forth between myself and SEO ROI, and SEOMoz, and from there out to the rest of the seo sphere. You’ll also see ones coming into me have a tendency to go out to wickedfire and bluehat seo. This forms an intricate kind of web of authority. Sites being linked to most in the pyramid become the top sites. Sites at the bottom become more eligible to become major players when linked to from someone at the top. Most likely quite similar to Google’s “authority”, but perhaps not as niche specific as Ask’s once was.
- Outbound Linking to Spammy Sites
(Read: Linking to anywhere but Google or Wikipedia)
- Use of NoFollow – I don’t for a second buy that right now Google is penalizing sites as being “SEOed” based on no-follow. It defeats their end goal of nofollow. But it’s still excellent for spotting certain CMS’s, like wordpress
- Outbound Linking and Niche “base” Sites
- Inbound Links
- Paid Links
- Whored out Links: Examine the place you purchased your links from. Are there any other obviously paid links? Is your relevant link sitting next to a Viagra link? If so, congrats. There’s a chance someone reported the site as paid links.
- Common Text: Contextual links rock. But is the article syndicated over 400 different identical blog?
- Common Location: Is the link in the footer? Rumor has it that GoogleBot uses the gecko rendering engine. This means that yes, they can tell where the links are located.
- Spammed Links
- Similar Text – Though it’s unlikely this is a current factor, is there similar text around your spammed links? Perhaps a common username? Remember Google’s social API, which attempts to link together social profiles. It’s not a stretch to say this already does, or could someday, work on forums and all social news sites.
- Common CMS/Footprinting – For the purpose of link spam, most sites are found by common footprints from that CMS. However, that means there are footprints for the search engines to discover. Or perhaps they’ve already classified and discounted links from super spammable software.
- Overspammed Locations – Think “guestbooks”. Ancient BBS implementations that haven’t seen a legitimate post in over 6 years. In the past, these have sped indexing. Nowadays, while that remains true, I’ve noticed they have if anything, a negative impact on rankings. So consider your links. Is there anything that just screams “link spam” about the locations.
- Link Building Metrics
- Link Velocity – The speed at which links are gathered, and their consistent speed. Ideally, a graph of when new links pop up should look like a bell curve, levelling out at some point on the way down.
- Link Temporization – Certain links get removed by admins for link spam. The percentage of these is ideally low, as natural sites do not have a large % of their links removed
- Link Location – Web 2.0 is a game of temporary bumps. Frontpaging on a site like digg creates a powerful link. But as soon as that link goes off the front page, some of the power is lost. Too much in the social media arena, and this can get messy. Beyond that, tags getting syndicated(wordpress.com for example keeps a feed of different tags), social bookmarks getting syndicated and later getting bumped off…there’s a lot that can change.
- Keyword Variance – If you have all links with 100% identical anchor text…something is amiss. Google seems to be decent at realizing this already. Yahoo, not as good. Live? Well, I don’t need to go there.
- Paid Links
- Domain Trust – Domain trust(and what occurs when it is lost) has always been a bit hard for me to analyze. But look at the parasite domains that get hit hard, then what happens after the fact. Is this changing? It has to to a certain extent. But sites like Digg have been parasited hundreds of times, lost some trust, and yet still rank. It’s odd. Then other domains that have been parasited still rank for real key terms, but can’t be used to parasite again.
- Internal SEO vs. External – The search engines have always been attempting to balance out the power of internal SEO(site structure, self-linking with proper anchor text, etc) with external SEO (inbound links). This balance from time to time changes. Also, from search engine to search engine the emphasis is completely different. Live for example, recently appears to be a sucker for internal links with a given anchor(or external links with identical anchor..no matter how many).
- Content Freshness – Google has a love-hate relationship with freshness. Showing recent news results requires ranking things that could not possibly have gathered too many external links yet. This really opens it up to spam. But at the same time, it’s what users are looking for. Figure out how it was handled, how it is being handled, and if there’s any difference.
- Internal Areas of Emphasis – Every search engine has a few things that are weighted internally to a different extent. Domains(exact/partial match), subdomains, titles, <h#> tags, etc. The balance in importance of these can change from time to time. It’s a bit hard to detect specifically, but a good thing to keep an overall eye on.
This is by all means not a complete list. But it’s a good starting place. Once again, I’ll say that not all of these appear to be in use now. This list are just some things I mentally check through whenever I see a major flux in the search results. It’s far from an exact science.