|
American Society for Indexing
INDEXING THE WEB
Back-of-the-Book Style Indexing
Indexed Sites
Metadata and Web Indexing
Subject Tree Indexing
Search Engine Technologies
Indexing the Web is not a simple task, and what is evolving to meet
the informational needs of Web users are three different kinds of indexing:
a back-of-the-book style of hard-coded index links within a Web site, subject
trees of reviewed sites, and search engines. Members of ASI's Web Indexing SIG are dedicated to excellence in this specialized area of indexing.
Some organizations are seeing that including indexes on their web sites is just as important as including indexes in books and online manuals. We've seen some good and some bad, some computer-generated, some obviously not constructed by professional
indexers, and some professionally prepared. In any case, all site owners should be commended for recognizing the need for an index. We'd like to share some interesting indexes with you, and information about how search engine indexing works. Have a look and see what value these indexes add! This list will be changing from time to time, so be sure to bookmark, print, download, or save by other means the ones to which you think you'll refer later.
Back-of-the-Book
Style Web Indexing
Many web sites opt to provide a search function for the site. While this
is certainly better than nothing, users encounter the same problems in
that scenario as they do in other full-text database searching. The major
problem is, of course, relevancy of items found via the search. For example,
on a software publisher's site a search for a product called Home Office,
ends up retrieving all documents with the word "office" in them, because
at the end of every page is the word "home". If there is a site index, you
can go directly to the "H" section, and find the one relevant page, thus
saving time for other projects. Not only will an index weed out
such irrelevant items, but of the many relevant ones, sub-headings give
users a clue as to which are more likely to answer their questions.
These selected sites are merely a selection of sites with interesting indexes
that we have happened to run into. The descriptions are written by those
submitting the site suggestion. Sites listed here are listed for educational
purposes only. The American Society for Indexing does not endorse the information
content of these sites.
NOTE ON SUBMISSIONS: we welcome any suggestions from
users about sites to add. Suggested URLs must be accompanied by (1) instructions
on how to get to the index from the site's home page, and (2) a description
about what is useful or unusual about the index. Please remember that
we wish to show actual indexes, not mere collections of links related to a certain topic.
Indexed Sites
- UNIXhelp for Users
- This online manual includes both a browsable index, and a keyword searchable
index. Select "Manual Index" from the menu on the home page.
- U.S. Census
Bureau
- To reach the index, click "Access Tools" on the home page, then "Subjects
A-Z". This index covers quite a bit of text. Topics such as "Homeownership",
"Household Economic Statistics", and "Households and Families" are only united
because they all begin with the letter H. There are also entries like "Housing:
-- Statistics -- Starts", where the two "sub-headings" are links, but the
main heading is not. Entries like "Indicators: -- Economic" are not flipped,
so there is no entry for "Economic indicators" (nor are there other items
under "Indicators").
- U.S.
Government Printing Office
- This alphabetical hyperlinked index is similar to a simple back-of-the-book index but without cross-references.
- Writer's
Block
- Writer's Block is a quarterly web magazine that publishes insightful and
entertaining articles for and about "Canadians in the writing trade." You
can reach the index by clicking the "Index" item on the right side of the
main menu across the top. The index is fully hypertext-linked. Because the
magazine is updated quarterly, the index is a living document.
Metadata and
Web Indexing
The META tag in HTML has been used with the goal of giving hints
about web page content to search engines. The abuse of the META tag by
webmasters who try to artificially raise the relevancy of a page by larding in META
tags with terms unrelated to the actual content of the page has run rampant. Most
commercial search engines now assign very little weight to text
found in META tags.
In response, movements to standardize META tag content have emerged. Corporations
and governmental bodies with many web sites often develop a public portal to their
web content. They can improve
search results for users by the careful use of structured META tags to guide their on-site search
engines. Indexers can apply their analysis skills to creating these structured tags.
Here are links about metadata, metatags and web page indexing.
-
Digital Object Identifier System
- The Digital Object Identifier (DOI) is a system for identifying and exchanging
intellectual property in the digital environment. It provides a framework for managing
intellectual content, for linking customers with content suppliers, for facilitating
electronic commerce, and enabling automated copyright management for all types of media.
Using DOIs makes managing intellectual property in a networked environment much easier
and more convenient, and allows the construction of automated services and transactions
for e-commerce.
-
Dublin Core Metadata Initiative
- The Dublin Core Metadata Initiative is an open forum engaged in the development
of interoperable online metadata standards that support a broad range of purposes and
business models. DCMI's activities include consensus-driven working groups, global
workshops, conferences, standards liaison, and educational efforts to promote widespread
acceptance of metadata standards and practices.
-
ITtoolbox.com
- This commercial site has an excellent collection of resource links on metadata as
it applies to database
design as well as structured meta tags and web indexing. Includes publications and software tools.
-
How To Use Meta Tags
- SearchEngine Watch explains meta tags, including their limitations.
-
US Governmment Information Locator Service (GILS)
- The goal of the Global Information Locator Service is to make it
easier for people to find all of the information they need. GILS is an open standard for
searching basic information descriptions. Such descriptions may be inserted into Web
documents with tools like TagGen, generated from databases with tools like MetaStar and
Microsoft Access; or edited by catalogers and just stored as documents. Based on the
ISO 23950 search standard, GILS includes the most commonly understood concepts by which
people worldwide find information sources in libraries--concepts like Title, Author,
Publisher, Date, and Place.
Subject
Tree and Reviewed Site Indexes
Some Web search tools review each site with human eyes and brains to decide
which categories and keywords fit the site, and then index it acccordingly.
An example would be Yahoo,
where hordes of people are building an index to the Web, which is also
searchable by a search engine.
Search
Engine Technologies
The vast majority of indexing on the Web is automatic, with a high level
of retrieval and a low rate of relevancy. Most indexers feel that the precision
rate most search engines provide is just not as good as true indexing.
But as search engine technologies become more sophisticated, we should
see some changes in the frustration level of people using these tools.
Most search engines actually search an index, a list of terms that robots
return from their voyages. Indexes could be manipulated or constructed
for these engines to use, especially on an Intranet, by careful use of
the META tag. This is an area that indexers should be researching and understanding,
so that we can index for these engines.
You can see a version of a search engine working with a carefully-constructed
set of indexes if you have Windows 95 and any of the Microsoft products
that feature the Answer Wizard. The topics in these help systems were indexed
in a special way that would help this engine bring them up with natural
language queries, with a weighted order. You have to understand the compiling
engine and the searching engine in order to index for it.
Below are some good sources for information on how search engines work,
and the current state of search engine technology.
Search Engines
What they are, how they work, and practical suggestions for getting
the most out of them.
Search Engine Watch
Web searching tips, listing of all the major search engines and meta search
engines, kid-safe searches, tests and ratings of search engines, search engine
technology and news. Also contains the current issue of an e-zine on search engine news and technology; subscribers can search an archive of back issues.
Complexity
in Indexing Systems - Abandonment and Failure: Implications for Organizing
the Internet, by Bella H. Weinberg
A paper presented at the 1996 ASIS conference by one of ASI's leading
indexers.
Mind Maps:
Hot New Tools Proposed for Cyberspace Librarians,
by Nancy Humphreys
This article appearing in
Searcher takes the back-of-the-book index in a new direction.
W3 Search Engines
This page, maintained by the Centre Universitaire d'Informatique (CUI) in Switzerland,
collects, categorizes and recommends search engines for many specific types of information.
Why On-Site Searching Stinks
a fascinating study done by User Interface Engineering. They measured successful task
performance using site-based full-text search engines, with dismal results.
|
 |