Thursday, February 5, 2015

Identifier persistence

I agree with Madam Librarian that the article A Policy Checklist for Enabling Persistence of Identifiers was convoluted and difficult to follow in places.  For the sake of my own understanding, I'll try to summarize here the basic points I got out of it.  Maybe this will help others in the class as well.

The article presents a set of numbered questions, but then proceeds to address them in a completely different order while attempting to map them to a checklist, which is numbered in yet another style. For the sake of sanity, I will just briefly summarize the article's main points in the order presented, and present examples where it seems useful.

What should I identify persistently?  Analyze your resources, decide which ones can be consistently identified in some way, and then prioritize these identifiable resources.  There will probably be many items that have identifiers, but only a subset of these will require persistence; typically these would represent key access points for your user community.

What steps should I take to guarantee persistence?  This is best handled through policies, supported by automation of processes. Information management should be decoupled from identifier management. In practice, this means that information within a system is identified and managed using local keys--e.g. this could be the URL for a journal article. However, the identifiers for this same information that are shared with outside entities--indexing services and library databases, e.g.--should be based on indirect identifiers, which can be updated when necessary in a way that is invisible to users.

An example of this is an article DOI.  The hypothetical article "36 Tips for Awesomeness" (local identifier) is published in the Spring 2006 issue of Fabulous Journal, given a URL of www.fabulousjournal.com/36_Tips_for_Awesomeness (local identifier), and assigned a DOI of 10.8992/fj.1234 (persistent identifier).  Over the next few years, the journal is bought by another publisher and all content is moved to www.awesomepublisher.com/journal/(ISSN)1000-0001. A year or two after that, the new publisher merges Fabulous Journal with Really Cool Journal, requests a new ISSN, and moves all content to www.awesomepublisher.com/journal/(ISSN)2002-200X. Awesome Publisher has good policies for persistence and updates DOI with each change.  This is the result:

10.8992/fj.1234 initially points to www.fabulousjournal.com/36_Tips_for_Awesomeness

10.8992/fj.1234 then points to www.awesomepublisher.com/journal/(ISSN)1000-0001/36_Tips_for_Awesomeness

10.8992/fj.1234 currently points  to www.awesomepublisher.com/journal/(ISSN)2002-200X/36_Tips_for_Awesomeness

As long as the services that refer to this article use the DOI instead of the article URL, it will remain accessible despite the changes going on in the background.

What technologies should I use to guarantee persistence?  Whichever ones work best with your existing technology and workflow. It's more important that the process works seamlessly and with minimum effort than it is to commit to one specific technology, no matter what That One IT Guy in your division says.

How long should identifiers persist?  The answer to this is, as long as is appropriate, but make sure that you (1) don't promise what you can't deliver (no one can actually guarantee "forever") and (2) are up-front about it ("provisions are in place to guarantee persistence for a minimum of 30 years beyond the online publication date" or "this link will expire in 7 days").

What do you mean by "persistent"?  The article explains that there are degrees of persistence, and breaks them down into a list (I'll use the same article example to explain).

Persistence of Name or Association:
  (1) The title "36 Tips for Awesomeness" will always be associated with that specific article on awesome tips--it won't suddenly be associated with an article on cattle diseases.
 (2)  The article may continue to be referred to in various places as www.fabulousjournal.com/36_Tips_for_Awesomeness even though that URL no longer works. In other words, the association persists in unmaintained places outside the control of the resource owner.
 (3) The article will always be associated with DOI 10.8992/fj.1234, whether or not the publisher updates the DOI information when the article changes location.

Persistence of Service:
 (1) Retrieval:  Can the item still be obtained over the guaranteed time period?  In the case of our article, two of the three listed URLs would eventually fail to retrieve the article, but the DOI should continue to work, resulting in retrieval of the article no matter where it is hosted.
 (2) Resolution:  A URL may resolve without resulting in a successful retrieval. For example, the original author of our 36 tips might get into a copyright dispute with the new publisher, resulting in the article being taken down.  In this case, the publisher might arrange for the URL to resolve to a page with the basic metadata for the article and a brief note about the missing content.  If the URL instead results in a "page not found" error, then it lacks persistent resolution.

Whether a service guarantees retrieval or resolution is an important distinction and should be clearly stated.  Both retrieval and resolution are essential but different.

Persistence of Accountability:
This is mostly for archival purposes. Is some kind of metadata maintained that gives the history of who has created and edited a specific record?

TL;DR:  Persistent identifier policies in an information management environment should clearly outline the following:  which identifiers will be persistent, how persistence will be maintained, how long the user can expect persistence to last, and whether persistence guarantees access to a specific item (retrieval) or guarantees access to (at minimum) information about that item (resolution).


2 comments:

  1. Thanks Nikki! This post gives (to my opinion) a much better break down of the information in Nicholas's article. I'm glad you were able to make sense of the organization of the article as it does have some helpful information.

    ReplyDelete
  2. Good work, Nikki! Just raising the issue of persistence should at least raise an eyebrow for professional librarians and archivists. A lot of issues to keep tabs on!!

    ReplyDelete