The TechnoPhileAugust 1999 |
Metadata: Dublin Core for Dummies |
| A monthly
column on new technologies for Library and Information work.
Technophile appears in print form in the LIANZA newsletter Library Life. Alastair
Smith
References:
|
![]() What is metadata? The term was originally used by database administrators in the 60’s who realised that to organise large quantities of data, they needed to have descriptions of the data. In effect they were repeating the discovery that librarians had made a few centuries earlier, that if you’ve got a lot of books to look after, it’s useful to have a catalogue to help you find them. Metadata is "structured data about data" – information that describes a piece of information. Metadata can be created outside of the item, as a library catalogue record; or it can be inside the item , as a cataloguing in publication (CIP) record. When we’re discussing metadata for web pages, we’re generally thinking of the information that is included in HTML pages defined by the <meta> tag, thoughtfully included by the designers of HTML who anticipated that we’d need it. The advantages of metadata are fairly obvious to librarians: if we have standardised descriptions of documents on the Web, defining the title, author, subject etc, it should help us to locate documents more readily. The problem is that the HTML <meta> tag isn’t particularly standardised – authors can put just about anything into a meta tag, and indeed less scrupulous sites try to attract users by loading up their pages with meta tags containing popular search terms, even if they bear little relation to the topic of the site. As a result of this abuse, many search engines actually ignore metadata. The Dublin Core initiative is coordinated by OCLC in Dublin, Ohio, with the aim of developing a common set of elements that describe Internet and other information resources. The aim is to keep the set simple, so untrained people can add descriptions to their web pages, and flexible enough so that new kinds of information can be included in the scheme. The current version of the Dublin core format has 15 elements, three of which are experimental.
<META NAME="DC.Title" CONTENT="The Brain of Katherine Mansfield">Dublin Core is not prescriptive about the format of the different elements. For instance no common thesaurus or set of subject headings is suggested for the subject element. This is deliberate – the aim is that Dublin Core will be applied by people who’re not familiar with an all encompassing scheme such as LCSH, and leaves indexers in a specific subject area free to use subject terms that suit them. You can check whether a web site has Dublin Core metadata by viewing the HTML source code (most Web browsers such as Netscape have an option for doing this) However you are unlikely to see many examples. According to a recent survey published in Nature by Lawrence and Giles, only 0.3% of web sites include Dublin Core metadata. Why is such an obvious boon ignored by the bulk of web authors? In practice web pages can be made by anyone, and most web authors don’t have the time or inclination to create metadata - even Dublin Core web pages at OCLC don’t contain DC metadata! So what is the future of Dublin Core? As librarians, we should promote the use of Dublin Core metadata in Web projects that we’re involved in. However in practice it may not be practicable or even desirable that all web pages have the full Dublin Core metadata treatment, any more than publishers provide CIP records for every page of a book. There are two kinds of pages that really need full metadata. One type is the "entry" page to a web site, which is the page that needs to be discovered by someone looking for information – if necessary other pages at the site can be found from that entry page. The other type is pages in subject gateways such as Librarians Index to the Internet, BUBL Link, OMNI etc, that describe web resources, and in practice these kinds of service are starting to use Dublin Core metadata in their databases. |
|
|
|