The TechnoPhile

August 1999

 

Metadata: Dublin Core for Dummies

A monthly column on new technologies for Library and Information work.

Technophile appears in print form in the LIANZA newsletter Library Life.

Alastair Smith
School of Communications and Information Management 
Victoria University of Wellington 
Alastair.Smith@vuw.ac.nz

References:  

  • Dublin Core Metadata Initiative http://purl.oclc.org/dc/
    • Lawrence, S; Giles, CL. (8 July 1999). Accessibility and Distribution of Information on the Web. Nature 400(6740): 107-109. [Summary at http://www.wwwmetrics.com/]
    Most of us involved with the World Wide Web have come across the terms "Metadata" and "Dublin Core". To many of us the terms sound very important but also tend to cause our eyes to glaze over. I’ve tried to treat this condition by browsing those bright yellow volumes in the bookstore that promise instant enlightenment for the comprehension-challenged: PC’s for Dummies, Brain Surgery for Dummies, etc. However Dublin Core for Dummies doesn’t seem to have made it into the Dummies inventory yet, so this month I’ll have a try myself. 

    What is metadata? The term was originally used by database administrators in the 60’s who realised that to organise large quantities of data, they needed to have descriptions of the data. In effect they were repeating the discovery that librarians had made a few centuries earlier, that if you’ve got a lot of books to look after, it’s useful to have a catalogue to help you find them. Metadata is "structured data about data" – information that describes a piece of information. Metadata can be created outside of the item, as a library catalogue record; or it can be inside the item , as a cataloguing in publication (CIP) record. When we’re discussing metadata for web pages, we’re generally thinking of the information that is included in HTML pages defined by the <meta> tag, thoughtfully included by the designers of HTML who anticipated that we’d need it.

    The advantages of metadata are fairly obvious to librarians: if we have standardised descriptions of documents on the Web, defining the title, author, subject etc, it should help us to locate documents more readily. The problem is that the HTML <meta> tag isn’t particularly standardised – authors can put just about anything into a meta tag, and indeed less scrupulous sites try to attract users by loading up their pages with meta tags containing popular search terms, even if they bear little relation to the topic of the site. As a result of this abuse, many search engines actually ignore metadata.

    The Dublin Core initiative is coordinated by OCLC in Dublin, Ohio, with the aim of developing a common set of elements that describe Internet and other information resources. The aim is to keep the set simple, so untrained people can add descriptions to their web pages, and flexible enough so that new kinds of information can be included in the scheme. The current version of the Dublin core format has 15 elements, three of which are experimental.

    1. Title: The title given by the creator of the resource.
    2. Creator: The person or organization primarily responsible for creating the intellectual content of the resource.
    3. Subject: keywords or phrases that describe the subject or content of the resource.
    4. Description: A textual description of the content of the resource
    5. Publisher: The entity responsible for making the resource 
    6. Contributor: editor, transcriber, illustrator, etc.
    7. Date: The date the resource was made available in its present form. 
    8. Resource Type: e.g. home page, novel, poem, working paper, technical report, essay, dictionary. 
    9. Format: used to identify the software and possibly hardware that might be needed to display or operate the resource. 
    10. Resource Identifier: String or number used to uniquely identify the resource, e.g. URLs or ISBNs.
    11. Source: used to uniquely identify the work from which this resource was derived
    12. Language: Language(s) of the intellectual content of the resource. 
    13. Relation (experimental): The relationship of this resource to other resources 
    14. Coverage (experimental): The spatial and/or temporal characteristics of the resource
    15. Rights (experimental): A link to a copyright notice, etc.
    What follows is an example of how Dublin core metadata might look if incorporated in a document:
    <META NAME="DC.Title" CONTENT="The Brain of Katherine Mansfield"> 
    <META NAME="DC.Creator" CONTENT="Manhire, Bill"> 
    <META NAME="DC.Type" CONTENT="text"> 
    <META NAME="DC.Date" CONTENT="1988"> 
    <META NAME="DC.Format" CONTENT="text/html"> 
    <META NAME="DC.Identifier" CONTENT=" http://www.het.brown.edu/people/easther/brain/"> 
    Dublin Core is not prescriptive about the format of the different elements. For instance no common thesaurus or set of subject headings is suggested for the subject element. This is deliberate – the aim is that Dublin Core will be applied by people who’re not familiar with an all encompassing scheme such as LCSH, and leaves indexers in a specific subject area free to use subject terms that suit them.

    You can check whether a web site has Dublin Core metadata by viewing the HTML source code (most Web browsers such as Netscape have an option for doing this) However you are unlikely to see many examples. According to a recent survey published in Nature by Lawrence and Giles, only 0.3% of web sites include Dublin Core metadata. Why is such an obvious boon ignored by the bulk of web authors? In practice web pages can be made by anyone, and most web authors don’t have the time or inclination to create metadata - even Dublin Core web pages at OCLC don’t contain DC metadata!

    So what is the future of Dublin Core? As librarians, we should promote the use of Dublin Core metadata in Web projects that we’re involved in. However in practice it may not be practicable or even desirable that all web pages have the full Dublin Core metadata treatment, any more than publishers provide CIP records for every page of a book. There are two kinds of pages that really need full metadata. One type is the "entry" page to a web site, which is the page that needs to be discovered by someone looking for information – if necessary other pages at the site can be found from that entry page. The other type is pages in subject gateways such as Librarians Index to the Internet, BUBL Link, OMNI etc, that describe web resources, and in practice these kinds of service are starting to use Dublin Core metadata in their databases.

    TechnoPhile Index
    Top of Page