Searching the Internet Effectively:
Search Engines
Search Engines are good for concepts that are represented by clearly
defined terminology, which will appear only in relevant items. Time taken
for "spider" to traverse web means that indexes won't be entirely up to
date.
Search Engines vary in the number of pages that they cover (see Greg Notess'
estimates of size and Search
Engine Watch's 2004 Update).
No search engine covers the entire Internet, different search engines cover
different parts of the Internet, and differences in ranking mean that different
sites appear in the first few pages. A 2007 study showed that in a typical search repeated across several search engines, almost 90% of results appear only from one search engine.
Thumbshots
comparison of ranking for searches is a good way to visualise how different search engines find different sites, and rank them differently.
Search engines do not index all pages on the Web. In particular, search engine robots may not index:
- pages generated from databases
- pages in frames, generated by javascript, etc.
- pages that have a "robot exclusion" tag
- pages that aren't linked to other pages
- pages on subscription sites or databases
So there is a lot of the Internet that is not covered by the standard search
engines: this is sometimes called "the invisible web" (Actually it's not
invisible, and much of it is indexed by other search tools, e.g. directories).
Search Features
Search engines have different methods of searching, and offer different features.
A good comparative table of features is provided by Greg Notess. However search
engines change their features frequently.
Exploring a search engine: Google
http://www.google.com/
Google is one of the most popular search engines. It ranks search results
according to the number of links to the site, and whether the sites linking
are themselves well linked. This is similar
to using citation counts to evaluate the importance of print documents, and
appears to be one of the keys to its effectiveness.
Relevancy
Most search engines present results ranked by relevancy. How relevancy
is determined varies according to the search engine, but in general relevancy
reflects:
- Frequency with which your search terms appear in document (taking
more note of terms that are uncommon in the database)
- How close together your search terms appear in the document
- Whether your search terms appear in the title, metadata or first
few paragraphs of the document
- How many sites link to the document
Using Relevance effectively
- The more relevant words that you put into the initial search, the better
the relevancy is likely to be.
- The total number of hits is less important than which items are in the
top 20-50
- Use the most specific words possible, and don't include "stop words"
such as "the", "and" etc
- Think about words that will appear on a relevant site.
- The hits presented on the list may not be the site you're looking for.
However they may LINK to the site you're looking for.

Try a search for "pollution aspects of windmills" on Google:
start with "pollution windmills"; then add more terms to see the effect
on your search result.
Relevancy ranking generally works well for sites with specific names e.g.
name of organisation. But:
- where name is shared by several organisations, e.g. "Victoria
University" - add other terms, e.g. "Wellington"
Quality of results may be improved by adding the name of a reputable organisation
- Compare "obesity
surgery" with "obesity
surgery national institutes health"
Excluding/including terms
If you enter several terms, Google will search for all the terms (implied
AND). To specifiy that you don't want pages with a particular term, put "-"
in front of the term.
This is useful where a search term may have different meanings. For example
"cycle" can mean the activity of bicycling, and (particularly in the
US) a motorcycle. Compare searches for:
cycle
cycle -motorcycle
Phrases and adjacency
In Google, as in most search engines, enclosing a phrase in quotes searches
on the phrase. this is particularly useful when a specific phrase contains very
common words. for example compare on Google:
just in time management
"just in time management".
Alternative terms
Often the same concept may be described by several different words. For
instance "pension" and "superannuation" describe the same concept.
We need the OR operator to search for these on Google. For example to search
for widows' pensions in New Zealand
Widows pension OR pensions OR superannuation zealand
Use the OR operator to include alternative spellings: "images christmas
OR xmas decorations"
It can also be useful to search by:
- date
- domain (e.g. exclude commercial sites in the .com domain)
- language
- parts of the page, e.g. title, metadata.
- Links to a site, e.g. to search for sites that link to a relevant site
How can you do these types of search in Google? One way is to use the "advanced search" option, which builds the search for you.
Caching
Google also keeps ("caches") a copy of the site as it was when the its robot indexed
it; this can be useful for sites that have become inaccessible. You can access the copy by following the link "Cached" in the record display.

Exercises
Try these searches on Google:
- Government policies on promoting elite sport
- Emigration from Tonga
- Regulatory regimes for ensuring building quality
Google has a number of specialist features that are accessible through Soople
Google even has had a special
song written about it.
- Should be narrow. Since search engines index most words
on pages (unlike a library catalogue), little need for broad searches. E.g.
if we want information about aperture control in digital cameras, "aperture
control digital cameras" is better than "digital cameras",
since search engine will pick up any mentions of "aperture control"
- Repeated narrow searches are better than broad search, if the topic
breaks into different aspects.
- Should be accurate
- Spelling of search terms (e.g. US/British)
- Evaluation of located sources is important -see later.
Other selected search engines
http://search.yahoo.com/
Yahoo! started life as a directory, then linked to search engines such as Google.
However Yahoo! now has its own search engine. Try a search on both Yahoo! and
Google, and compare the results.
http://www.bing.com/
Bing (previously Live Search) is Microsoft's Web search engine. It suggests related searches, and provides a snapshot of text from each page when you place the cursor next to the item in the search result. The advanced search option allows you to dynamically build a search.
http://www.ask.com
The Ask.com search engine is the result of a merger of two "search engines":
- AskJeeves: a database of common queries. Accepts natural
language queries, and matches them to the database.
- Teoma: a search engine that allows interesting search refinement

Check Ask.com for information about
- how to convert a quarter acre into hectares.
- palmtop computers (look at "narrow your search" and "expand your search")
http://www.exalead.com/
Includes some useful search features, including truncation, phonetic search, and a NEAR operator. Gives suggestions for narrowing your search.
http://www.wolframalpha.com/
Wolfram|Alpha indexes sites that contain numeric or other structured information, and attempts to answer questions about this data. So "height Mount Cook" gives the altitude and calculates the likely air temperature and pressure at the summit.
http://scholar.google.co.nz/
Google Scholar indexes research material (e.g. reports, e-journals, conference papers) on the web, and identifies citations to other research material. The display includes the number of times a document has been cited, and searches also retrieve references to print documents that have been cited in reasearch documents on the web.
Metasearch engines
Metasearch engines, or combined search engines, search across several search
engines. Since no one search engine covers the whole Web, using metasearch
engines makes sense, if:
- Your topic can be expressed in a simple search that can be executed
easily across several search engines
- The number of items likely to be found is small
- You want your search to be as complete as possible
A disadvantage is that you can't take advantage of the special features
of a particular search engine, and you may only be shown the first few from
each search engine. Also, some search engines don't allow access by metasearch engines.
Selected metasearch engines are:
http://clusty.com/
Clusty (formerly Vivisimo) returns results from several search engines, and clusters them by
common terms.
http://www.metacrawler.com/
Metacrawler collates results from the different search engines, and presents
a combined list, with duplicates removed.

Compare a search on Clusty and MetaCrawler for information on "student
allowances"
Search Engine Resources:
Some further reading on search engines:
- Search Engine showdown/ Greg
Notess. http://www.searchengineshowdown.com/
- Surveys search engines from a reference librarian's point of view.
- Search Engine Watch/ Danny Sullivan. http://searchenginewatch.com/
- A good overall and up-to-date directory; though oriented to assisting web
managers to get their sites "noticed"; rather than helping searchers.
-
- Google Guide
- An interesting interactive tutorial on Google, with sections for novices, experts and teens.
Search Strategies
Last updated 19 June 2009 by Alastair Smith