Wednesday, February 18, 2015

Expanding linked data: Why isn't this done?

JPow asks an excellent question in his LS blog:  Why can't citation data (cited and cited by) be included in surrogate records?

I'm going to talk about this in terms of discovery layers rather than online catalogs, because online catalogs are typically very basic, MARC-based software that is fading rapidly into the background of library systems.

I think that JPow's concept could be implemented using a combination of CrossRef, Web of Science, and Scopus -- a terrific idea, and one that is implemented already to some degree, in some discovery systems, but definitely not to any universal extent.

What's the catch? I suspect that it's mostly about money and proprietary data. Citation indexes like Web of Science and Scopus are very expensive to maintain, and very big money makers for the companies that own them. They are willing to share basic indexing and citation metadata with the discovery services, but part of the agreement is that libraries must ALSO have licensed the databases separately before they are allowed to see results from those databases included in the discovery layer, and even then much of the "secret sauce" of citation tracing and other advanced functionality isn't included. (In fairness, quite a bit of this wouldn't easily translate to the simpler discovery interface.)

What I haven't seen implemented yet is CrossRef, and that has interesting potential. I think that one catch there is that it tends to be implemented as part of the full text of articles, in the references section. That section of the full text would perhaps have to be separated in some way and included in the metadata stored by the discovery service. I think that's possible, though I don't know if any systems are doing it currently.  I think authentication could be the other tricky piece, since CrossRef links directly through DOI.  This isn't a huge issue for on-campus users (who are generally authenticated by IP address) but directing off-campus users through the right hoops (proxy server, Shibboleth, etc.) is a potential hurdle.

I did check my library's discovery system (ProQuest Summon) and found that it offers our users "People who read this journal article also read" article links attached to records it gets from Thomson Reuters Web of Science. On the other hand, it doesn't offer any extra links for records it gets from Elsevier (ScienceDirect and Scopus). We see the Web of Science information because we've separately licensed those indexes from Thomson Reuters, and that means Summon is "allowed" to show us those records. We don't see the citation links from Scopus because we haven't licensed that product, so Summon isn't allowed to present any results from that dataset. I also find it interesting that Web of Science appears to share usage-based metadata but is not sharing citation-based metadata; I'm guessing maybe they see that as potentially too cannibalistic to their own service.

So, the short answer? JPow is asking for a rabbit, yes, and it's not from a hat, but from a deep and twisty rabbit hole. I don't think it's asking too much, though I do think it would be expensive.

2 comments:

  1. Right, this is more or less what I expected to be the case. Although I think the expense would certainly be worth it.

    ReplyDelete
  2. Thanks for the wonderful response to JP, Nikki!!

    I would just echo your points about the for-profit entities that are in the organizing business (basically, the periodical database companies who are now selling discovery services). They will continue to complicate matters as they work through their business plan and profitability issues....

    ReplyDelete