Initially, all a webmaster needed to do was submit a page, or URI, to the various engines which would send a spider to "crawl" that page, extract links to other pages from it, and return information found on the page to be indexed. The process involves a search engine spider downloading a page and storing it on the search engine's own server, where a second program, known as an indexer, extracts various information about the page, such as the words it contains and where these are located, as well as any weight for specific words, as well as any and all links the page contains, which are then placed into a scheduler for crawling at a later date.
Site owners started to recognize the value of having their sites highly ranked and visible in search engine results, creating an opportunity for both "white hat" and "black hat" SEO practitioners. Indeed, by 1996, email spam could be found on usenet touting SEO services.The earliest known use of the phrase "search engine optimization" was a spam message posted on Usenet on July 26, 1997.
At first, search engines were supplied with information about pages by the webmasters themselves. Early versions of search algorithms relied on webmaster-provided information such as the keyword meta tag, or index files in engines like ALIWEB. Meta-tags provided a guide to each page's content. But indexing pages based upon meta data was found to be less than reliable, because some webmasters abused meta tags by including irrelevant keywords to artificially increase page impressions for their website and to increase their ad revenue. Cost per thousand impressions was at the time the common means of monetizing content websites. Inaccurate, incomplete, and inconsistent meta data in meta tags caused pages to rank for irrelevant searches, and fail to rank for relevant searches. Search engines responded by developing more complex ranking algorithms, taking into account additional factors including:
-
Text within the title element
-
Domain name
-
URL directories and file names
-
HTML tags: headings, emphasized (<em>) and strongly emphasized (<strong>) text
-
Term frequency, both in the document and globally, often misunderstood and mistakenly referred to as Keyword density
-
On page keyword proximity
-
On page keyword adjacency
-
On page keyword sequence
-
Alt attributes for images
-
Text within NOFRAMES tags
-
Web content development
-
Sitemaps
There are no major search engines which state that they consider meta keywords in their ranking algorithms these days, the way that Altavista did in the late 90s. The value of meta keywords are, however, not readily known because of the secrecy used during the ranking of pages by the search engines. One could recommend the use of meta keywords in webpages, but there may be little value in doing so. However, some sites continue to use them. For example, the source code of this page shows that Wikipedia uses meta keywords. The "description" tag is, however, claimed by most SEO-experts to be more important and is recommended by Yahoo! in their search indexing help page.
Web content providers also manipulated a number of attributes within the HTML source of a page in an attempt to rank well in search engines.
By relying so much upon factors exclusively within a webmaster's control, early search engines suffered from abuse and ranking manipulation. To provide better results to their users, search engines had to adapt to ensure their SERPs showed the most relevant search results, rather than unrelated pages stuffed with numerous keywords by unscrupulous webmasters. This led to the rise of a new kind of search engine.
More sophisticated ranking algorithms
Google's PageRank algorithm weights a page's importance based upon the quantity and quality of incoming links.[8] PageRank estimates the likelihood that a given page will be reached by a web user who randomly surfs the web, and follows links from one page to another. In effect, this means that some links are more valuable than others, as a higher PageRank page is more likely to be reached by the random surfer.
The PageRank algorithm proved very effective, and Google began to be perceived as serving the most relevant search results. On the back of strong word of mouth from programmers, Google became a popular search engine. Off-page factors such as PageRank and hyperlink analysis were considered as well as on-page factors to enable Google to avoid the kind of manipulation seen in search engines focusing primarily upon on-page factors for their rankings.
Despite being difficult to game, webmasters had already developed link building tools and schemes to influence the Inktomi search engine, and these methods proved similarly applicable to gaining PageRank. Many sites focused on exchanging, buying, and selling links, often on a massive scale.
Inktomi, an earlier search engine using similar off-page factors, had forced webmasters to develop link building tools and schemes to influence searches; these same tools proved applicable to Google's PageRank system. Thus an online industry spawned focused on selling links designed to improve PageRank and link popularity. To drive human site visitors, links from higher PageRank pages sell for more money.
Google — and other search engines — have, over the years, developed a wider range of off-site factors they use in their algorithms. The Internet was reaching a vast population of non-technical users who were often unable to use advanced querying techniques to reach the information they were seeking and the sheer volume and complexity of the indexed data was vastly different from that of the early days. Combined with increases in processing power, search engines have begun to develop predictive, semantic, linguistic and heuristic algorithms. Around the same time as the work that led to Google, IBM had begun work on the Clever Project, and Jon Kleinberg was developing the HITS algorithm.
As a search engine may use hundreds of factors in ranking the listings on its SERPs; the factors themselves and the weight each carries can change continually, and algorithms can differ widely, with a web page that ranks #1 in a particular search engine possibly ranking #200 in another search engine, or even on the same search engine a few days later.
Google, Yahoo, Microsoft and Ask.com do not disclose the algorithms they use to rank pages. Some SEOs have carried out controlled experiments to gauge the effects of different approaches to search optimization. Based on these experiments, often shared through online forums and blogs, professional SEOs attempt to form a consensus on what methods work best, although consensus is rarely, if ever, actually reached.
SEOs widely agree that the signals that influence a page's rankings include:
-
Keywords in the title tag.
-
Keywords in links pointing to the page.
-
Keywords appearing in visible text.
-
Link popularity.
-
PageRank of the page (for Google).
-
Keywords in Heading Tag H1,H2 and H3 Tags in webpage.
-
Linking from one page to inner pages.
-
Placing punch line at the top of page.
There are many other signals that may affect a page's ranking, indicated in a number of patents held by various search engines, such as historical data.