Basket £ 0.00 (0 items)
You are here: HomeArticle › The Future of Publishing on the Web?

The Future of Publishing on the Web?

computer

How often do you need access to electronic information, be it an article, an image, a bibloigraphy or a video, that you know is available on the web, and yet you have no idea where it is located? Even if you do manage to find it, how do you know whether it is the current version and what the copyright implications are for using any of the content? In the fast changing world of content production, keeping track of the location and ownership of electronic files is a constant headache. Wouldn't it be really useful if you could link to the specific content you are searching for and also be notified of the copyright implications for its use with a single keystroke. Well, that is the challenge facing researchers involved in the development of Digital Object Identifier-based technologies.

The concept of a Uniform Resource Name

The research project we have been conducting at the University of Nottingham is to investigate the wider issues surrounding descriptive metadata, persistent identifiers, and reference linking of documents. The technologies we are investigating apply to all documents, including those in Word, HTML or XML, but we have a particular interest in making them work for document collections in Adobe Acrobat PDF.

The idea of a Uniform Resource Locator (URL), or hyperlink, is well known from the World Wide Web. The URL essentially says where on the web a target resource is located. What is not so familiar is the notion of a Uniform Resource Name (URN) which denotes a target resource by a unique and persistent name. Thus if the name ?davesdocument' has been uniquely registered to someone, for worldwide use, it could be included in a Word document, or a PDF, or directly into the HTML tag, to indicate the desired target document to be retrieved. It would be the job of the next generation 'Semantic Web' envisaged by Tim Berners-Lee in a recent Scientific American article to find out whereabouts on the web 'davesdocument' is actually located. And if this could be implemented, many of the infuriating 'Error: 404 Not Found' messages of today's web would be eliminated at a stroke - the URN system would automatically keep track of any document movements and could even give you choices of several places where this document could be retrieved from.

For various technical and political reasons URNs are not part of today's web. It's not too difficult to see that a full implementation would need a total overhaul of the internet Domain Name resolution schemes and, preferably, a new generation of smart hardware routers as well. A nagging worry in any implementation of URNs is how scalable the architecture would be if hundreds of millions of web users decided to use them. This is just one of the reasons for slow and cautious progress by the World Wide Web Consortium (W3C) and the Internet Engineering Task Force (IETF) the authorities who would be in charge of any large-scale deployment. There would also be considerable problems of registering the URNs and making users take on the responsibility of keeping them persistent (i.e. available at all times) and up to date (always leading to the correct object).

The Digital Object Identifier initiative
Despite these notes of caution certain URN-related pilot projects are under way and, at the moment, the Digital Object Identifier (DOI) initiative is one of the most highly developed persistent identifier schemes available. Created and controlled by the International DOI Foundation (IDF), the DOI is a globally unique, persistent, and actionable identifier for digital objects of any form (see http://www.doi.org).

The DOI initiative was started by the American Association of Publishers and, to date, it has been adopted by a number of publishing companies (the IDF is now courting interest from organisations dealing with other forms of media). It's not difficult to see the payoff for academic and technical publishers in adopting the DOI. It was agreed at an early stage that, at the very minimum, a DOI should be capable of providing some basic structured metadata about the target object, much the same as an entry in a telephone directory. This enables a user to ensure that the entity they have identified is actually the one they are looking for. The reliability and accuracy of the DOI give it the ability to become a form of 'electronic ISBN/ISSN' but with the added value that the metadata it delivers can focus on publisher issues - primarily, rights management and electronic commerce. The advantages of negotiating inter-publisher rights, and conducting e-commerce, via a unique identifier (the DOI-administered 'handle') for the document or 'work' in question are plainly evident.

Benefits of the Digital Object Identifier System

  • Provides an extensible framework for managing intellectual content in any form at any level of granularity.

  • Links customers with content suppliers.

  • Facilitates electronic commerce.

  • Enables automated copyright management for all types of media

Source: The International DOI Foundation

The Handle Network

The underlying architecture of the DOI system is the Handle Network, a project of CNRI in Reston, Virginia. The handle system provides a highly flexible, scalable network of persistent identifiers, of which the DOI is a subset. Unlike a URL, which resolves only to a single document on a single server, the resolution of a handle consists of a series of name-value pairs which can be used to store data of various types. This ability to store significant amounts of metadata inside the identifier is a powerful enabling tool, and allows features such as multiple resolution (the resolution of a single identifier to multiple copies or versions of a document). CNRI make available a handle server, written in Java, which is freely downloadable from their web site. This server allows for local resolution of handles within the user's site, wherever possible, but it can pass on unresolved handles to the server at CNRI. The CNRI server can then either resolve the request itself or pass it on to other registered handle servers.

The CrossRef project

Many believe that the technologies and concepts used in the DOI project can be applied with equally positive results to the processes of resource discovery and reference linking. The CrossRef project (http://www.crossref.org) is making some progress in this respect,establishing a service for collaborative reference linking within the constraints of the DOI. The aim is very simple: journal publishers will increasingly move towards making all the bibliographic references at the end of an article be 'live' by embedding a DOI, leading to metadata, rather than a URL, behind each one. However, if this tech-nology is to become widely adopted by librarians and end-users then there will be a need for freely available search engines delivering DOIs (if they exist) which lead, in turn, to high quality metadata. Moreover, one needs an assurance that the whole architecture will be scalable and will not become totally dependent on a cluster of servers running white-hot at CNRI.

Progress of the EPRG project

In our project, within the EPRG at Nottingham (http://www.ep.cs.nott.ac.uk), we are investigating what a DOI-like system of persistent identifiers can bring to resource discovery and reference linking. We are using the raw Handle System that forms the basis of the DOI. This allows us greater freedom of experimentation in our test systems and frees us from the commercial responsibilities embodied in issuing true DOIs. It is important to note, however, that while we do not conform fully to the DOI protocols, everything learned or achieved in our project is applicable, in theory, to DOIs.

Our test-bed is a corpus of almost 200 journal papers from the journal Electronic Publishing - Origination, Dissemination and Design. These papers, published from 1988 to 1995, are stored in PDF format - the result of the EPRG's CAJUN project sponsored by John Wiley Ltd, which ran from 1993 to 1998.

Because our test domain consists of PDF files, we are making use of Adobe Acrobat's plug-in API to create handle software for Acrobat. The storage capability of the handle system allows us to move nearly all of the metadata relating to a particular document into the document identifier itself. In this respect, we use our local handle server as a globally accessible database. A document's identifier can be resolved to URLs, Dublin Core metadata, referenced identifiers of cited documents, or all of these at once.

This external storage of metadata means that the only thing a particular PDF document needs to 'know' in order to retrieve its own metadata, is its own handle identifier. Through the plugin, it can then resolve this handle and retrieve the data. This approach also means that an existing PDF file need only be changed once - to insert a reference to its handle - thereafter it is a part of this system, and the metadata can be maintained without any changes to the PDF file itself.

Our final aim is to produce a fully functional prototype that provides a completely integrated linking experience within Adobe Acrobat software. A user will be able to select a reference from the 'References' section of any of our journal papers and will then be presented with a number of options. Minimally, this will include the option to view the metadata for the cited reference. Many other options, such as links to multiple full-text versions of the document, to reviews, or (in the case of book citations) links to amazon.com, would not be difficult to implement.

The practices we are using here are experimental, but it is not difficult to picture a world where resource discovery would become almost trivial because 'everything is linked to everything else'. It seems that DOI-based technologies are beginning to make their mark within the communities of publishing, libraries and academic research. Whether and when they become commonplace over the web as a whole will depend on IETF and W3C taking their cue from today's pilot projects and being prepared to put in the investment to make World Wide URNs a reality.

 

Contact

For more information about The National Computing Centre and our services, please contact us at the details below:

Email: info@ncc.co.uk
Telephone: +44 (0)870 908 8767
Fax: +44 (0)870 134 0931

Click here for more contact information


TwitterFollow us on Twitter
Linked InJoin our LinkedIn Group
FBLike us on Facebook

 

Management Guidelines

NCC Guidelines Vol 5 No 1

more in Management Guidelines

 

Professional Development

Cloud Computing

more in Professional Development

 

Analyst Digest

September 2016 Bulletin published

more in Analyst Digest