cultures

Tuesday, March 27, 2007

More sophisticated ranking algorithms

An example of a more sophisticated search ranking algorithm, Google's PageRank rates page quality based upon the quantity and importance of incoming links.[6] PageRank estimates the likelihood that a given page will be reached by a web user who randomly surfs the web, and follows links from one page to another. In effect, this means that some links are stronger than others, as a higher PageRank page is more likely to be reached by the random surfer.

The PageRank algorithm proved very effective, and Google began to be perceived as serving the most relevant search results. On the back of strong word of mouth from programmers, Google became a popular search engine. Off-page factors such as PageRank and hyperlink analysis were considered as well as on-page factors to enable Google to avoid the kind of manipulation seen in search engines focusing primarily upon on-page factors for their rankings.

Although more difficult to game PageRank, webmasters had already developed link building tools and schemes to influence the Inktomi search engine, and these methods proved similarly applicable to gaining PageRank. Many sites focused on exchanging, buying, and selling links, often on a massive scale. Thus an online industry spawned focused on selling links designed to improve PageRank and link popularity.

To reduce the impact of link schemes, search engines have developed a wider range of undisclosed off-site factors they use in their algorithms. As a search engine may use hundreds of factors in ranking the listings on its SERPs; the factors themselves and the weight each carries can change continually, and algorithms can differ widely. The four leading contextual search engines, Google, Yahoo, Microsoft and Ask.com, do not disclose the algorithms they use to rank pages. Some SEOs have carried out controlled experiments to gauge the effects of different approaches to search optimization. Based on these experiments, often shared through online forums and blogs. SEO practioners may also study patents held by various search engines to gain insight into the algorithms.

History

Origin: Early search engines

Webmasters and content providers began optimizing sites for search engines in the mid-1990s, as the first search engines were cataloging the early Web.

Initially, all a webmaster needed to do was submit a page, or URI, to the various engines which would send a spider to "crawl" that page, extract links to other pages from it, and return information found on the page to be indexed.[1] The process involves a search engine spider downloading a page and storing it on the search engine's own server, where a second program, known as an indexer, extracts various information about the page, such as the words it contains and where these are located, as well as any weight for specific words, as well as any and all links the page contains, which are then placed into a scheduler for crawling at a later date.

Site owners started to recognize the value of having their sites highly ranked and visible in search engine results, creating an opportunity for both "white hat" and "black hat" SEO practitioners. Indeed, by 1996, email spam could be found on Usenet touting SEO services.[2] The earliest known use of the phrase "search engine optimization" was a spam message posted on Usenet on July 26, 1997.[3]

Early versions of search algorithms relied on webmaster-provided information such as the keyword meta tag, or index files in engines like ALIWEB. Meta-tags provided a guide to each page's content. But indexing pages based upon meta data was found to be less than reliable, because some webmasters abused meta tags by including irrelevant keywords to artificially increase page impressions for their website and to increase their ad revenue. Cost per thousand impressions was at the time the common means of monetizing content websites. Inaccurate, incomplete, and inconsistent meta data in meta tags caused pages to rank for irrelevant searches, and fail to rank for relevant searches.[4] Web content providers also manipulated a number of attributes within the HTML source of a page in an attempt to rank well in search engines.[5]

By relying so much upon factors exclusively within a webmaster's control, early search engines suffered from abuse and ranking manipulation. To provide better results to their users, search engines had to adapt to ensure their results pages showed the most relevant search results, rather than unrelated pages stuffed with numerous keywords by unscrupulous webmasters. Search engines responded by developing more complex ranking algorithms, taking into account additional factors that were more difficult for webmasters to manipulate.