The operators of search engines recognized quickly that some people from the webmaster community were making efforts to rank well in their search engines, and even manipulating the page rankings in search results. In some early search engines, such as Infoseek, ranking first was as easy as grabbing the source code of the top-ranked page, placing it on your website, and submitting a URL to instantly index and rank that page.
Due to the high value and targeting of search results, there is potential for an adversarial relationship between search engines and SEOs. In 2005, an annual conference named AirWeb was created to discuss bridging the gap and minimizing the sometimes damaging effects of aggressive web content providers.
Some more aggressive site owners and SEOs generate automated sites or employ techniques that eventually get domains banned from the search engines. Many search engine optimization companies, which sell services, employ long-term, low-risk strategies, and most SEO firms that do employ high-risk strategies do so on their own affiliate, lead-generation, or content sites, instead of risking client websites.
Some SEO companies employ aggressive techniques that get their client websites banned from the search results. The Wall Street Journal profiled a company that allegedly used high-risk techniques and failed to disclose those risks to its clients. Wired reported the same company sued a blogger for mentioning that they were banned. Google's Matt Cutts later confirmed that Google did in fact ban Traffic Power and some of its clients.
Some search engines have also reached out to the SEO industry, and are frequent sponsors and guests at SEO conferences and seminars. In fact, with the advent of paid inclusion, some search engines now have a vested interest in the health of the optimization community. All of the main search engines provide information/guidelines to help with site optimization: Google's, Yahoo!'s, MSN's and Ask.com's. Google has a Sitemaps program to help webmasters learn if Google is having any problems indexing their website and also provides data on Google traffic to the website. Yahoo! has Site Explorer that provides a way to submit your URLs for free (like MSN/Google), determine how many pages are in the Yahoo! index and drill down on inlinks to deep pages. Yahoo! has an Ambassador Program and Google has a program for qualifying Google Advertising Professionals.
Getting into search engines' databases
Today's major search engines, by and large, do not require any extra effort to submit to, as they are capable of finding pages via links on other sites.
However, Google and Yahoo offer submission programs, such as Google Sitemaps, for which an XML type feed can be created and submitted. Generally, however, a simple link from a site already indexed will get the search engines to visit a new site and begin spidering its contents. It can take a few days or even weeks from the acquisition of a link from such a site for all the main search engine spiders to begin indexing a new site, and there is usually not much that can be done to speed up this process.
Once the search engine finds a new site, it uses a crawler program to retrieve and index the pages on the site. Pages can only be found when linked to with visible hyperlinks. However, some search engines, such as Google, are starting to read links created within Flash.
Search engine crawlers may look at a number of different factors when crawling a site, and many pages from a site may not be indexed by the search engines until they gain more PageRank, links or traffic. Distance of pages from the root directory of a site may also be a factor in whether or not pages get crawled, as well as other importance metrics. Cho et al.[23] described some standards for those decisions as to which pages are visited and sent by a crawler to be included in a search engine's index.
A few search engines, such as Yahoo!, operate paid submission services that guarantee crawling for either a set fee or CPC. Such programs usually guarantee inclusion in the database, but does not guarantee specific ranking within the search results.
Blocking robots
Webmasters can instruct spiders not to crawl certain files or directories through the standard robots.txt file in the root directory of the domain. Additionally, a page can be explicitly excluded from a search engine's database by using a meta tag specific to robots.
When a search engine visits a site, the robots.txt located in the root directory is the first file crawled. The robots.txt file is then parsed, and will instruct the robot as to which pages are not to be crawled. As a search engine crawler may keep a cached copy of this file, it may on occasion crawl pages a webmaster does not wish crawled.
Pages typically prevented from being crawled include login specific pages such as shopping carts and user-specific content such as search results from internal searches.
Classifications
SEO techniques are classified by some into two broad categories: techniques that search engines recommend as part of good design, and those techniques that search engines do not approve of and attempt to minimize the effect of, referred to as spamdexing. Most professional SEO consultants do not offer spamming and spamdexing techniques amongst the services that they provide to clients. Some industry commentators classify these methods, and the practitioners who utilize them, as either "white hat SEO", or "black hat SEO". Many SEO consultants reject the black and white hat dichotomy as a convenient but unfortunate and misleading over-simplification that makes the industry look bad as a whole. The comparison of white hat to black hat (spamdexing) methods is analogous to "positioning" compared to "guerilla marketing", with the latter spoiling the reputation of marketing as a whole.
Preferred "White hat" methods
An SEO tactic, technique or method is considered "White hat" if it conforms to the search engines' guidelines and/or involves no deception. As the search engine guidelines are not written as a series of rules or commandments, this is an important distinction to note. White Hat SEO is not just about following guidelines, but is about ensuring that the content a search engine indexes and subsequently ranks is the same content a user will see.
White Hat advice is generally summed up as creating content for users, not for search engines, and then make that content easily accessible to their spiders, rather than game the system. White hat SEO is in many ways similar to web development that promotes accessibility, although the two are not identical.
Spamdexing "Black hat" methods
"Black hat" SEO are methods to try to improve rankings that are disapproved of by the search engines and/or involve deception. This can range from text that is "hidden", either as text colored similar to the background or in an invisible or left of visible div, or by redirecting users from a page that is built for search engines to one that is more human friendly. A method that sends a user to a page that was different from the page the search engined ranked is Black hat as a rule. One well known example is Cloaking, the practice of serving one version of a page to search engine spiders/bots and another version to human visitors.
Search engines may penalize sites they discover using black hat methods, either by reducing their rankings or eliminating their listings from their databases altogether. Such penalties can be applied either automatically by the search engines' algorithms or by a manual review of a site.
One infamous example was the February 2006 Google removal of both BMW Germany and Ricoh Germany for use of deceptive practices.. Both companies, however, quickly apologized, fixed the offending pages, and were restored to Google's list
All text of this article available under the terms of the GNU Free Documentation License (see Copyrights for details).