Challenges faced by search enginesThe Web is growing much faster than any present-technology search engine can possibly index. The Web is growing much faster than any present-technology search engine can possibly index. Many webpages are updated frequently, which forces the search engine to revisit them periodically. The queries one can make are currently limited to searching for key words, which may result in many false positives, especially using the default whole-page search. Better results might be achieved by using a proximity-search option with a search-bracket to limit matches within a paragraph or phrase, rather than matching random words scattered across large pages. Another alternative is using human operators to do the researching for the user with organic search engines. Dynamically generated sites may be slow or difficult to index, or may result in excessive results, perhaps generating 500 times more webpages than average. Example: for a dynamic webpage which changes content based on entries inserted from a database, a search-engine might be requested to index 50,000 static webpages for 50,000 different parameter values passed to that dynamic webpage. Many dynamically generated websites are not indexable by search engines; this phenomenon is known as the invisible web. There are search engines that specialize in crawling the invisible web by crawling sites that have dynamic content, require forms to be filled out, or are password protected. Relevancy: sometimes the engine can't get what the person is looking for. Some search-engines do not rank results by relevance, but by the amount of money the matching websites pay. In 2006, hundreds of generated websites used tricks to manipulate a search-engine to display them in the higher results for numerous keywords. This can lead to some search results being polluted with linkspam or bait-and-switch pages which contain little or no information about the matching phrases. The more relevant webpages are pushed further down in the results list, perhaps by 500 entries or more. Secure pages (content hosted on HTTPS URLs) pose a challenger for crawlers which either can't browse the content for technical reasons or won't index it for privacy reasons All text of this article available under the terms of the GNU Free Documentation License (see Copyrights for details).
The Web is growing much faster than any present-technology search engine can possibly index. Many webpages are updated frequently, which forces the search engine to revisit them periodically. The queries one can make are currently limited to searching for key words, which may result in many false positives, especially using the default whole-page search. Better results might be achieved by using a proximity-search option with a search-bracket to limit matches within a paragraph or phrase, rather than matching random words scattered across large pages. Another alternative is using human operators to do the researching for the user with organic search engines. Dynamically generated sites may be slow or difficult to index, or may result in excessive results, perhaps generating 500 times more webpages than average. Example: for a dynamic webpage which changes content based on entries inserted from a database, a search-engine might be requested to index 50,000 static webpages for 50,000 different parameter values passed to that dynamic webpage. Many dynamically generated websites are not indexable by search engines; this phenomenon is known as the invisible web. There are search engines that specialize in crawling the invisible web by crawling sites that have dynamic content, require forms to be filled out, or are password protected. Relevancy: sometimes the engine can't get what the person is looking for. Some search-engines do not rank results by relevance, but by the amount of money the matching websites pay. In 2006, hundreds of generated websites used tricks to manipulate a search-engine to display them in the higher results for numerous keywords. This can lead to some search results being polluted with linkspam or bait-and-switch pages which contain little or no information about the matching phrases. The more relevant webpages are pushed further down in the results list, perhaps by 500 entries or more. Secure pages (content hosted on HTTPS URLs) pose a challenger for crawlers which either can't browse the content for technical reasons or won't index it for privacy reasons All text of this article available under the terms of the GNU Free Documentation License (see Copyrights for details).