Thursday, March 5, 2015

Building controlled vocabularies

I'm a huge fan of using controlled vocabularies for even relatively small tasks.  I've been known to hand out journal spreadsheets to subject specialists with renewal decision options locked into drop-down menus to discourage their use of creative, difficult-to-interpret language.  And I still grumble about the co-workers who designated items bought with a small pool of special funds with the following notes in our ILS order records: "One-time funds," "One time $," "1-time funds," "1 time $," "1X funds," and "1X $."  Seriously, try finding all of those variations with nothing but a text-string based search function!

Needless to say, I was quite interested in this article on how to build controlled vocabularies. The first thing I learned was that, while "controlled vocabulary" is often used generically, there is a specific hierarchy described by different terms.  A controlled vocabulary on its own is just a specific list of terms where only terms from that list can be used for certain purposes. A simple example is a list of library locations--"REF," "Children's," "YA," etc. that are consistently used in the catalog. A taxonomy is a controlled vocabulary too, but it has a hierarchical structure of parent/child relationships between the terms, suggesting that it is likely larger and more complex. The list of library locations could be part of a larger taxonomy that is all the controlled vocabularies in the ILS--item locations, fund codes, patron classes, and so on. A thesaurus is even more complicated--like a taxonomy, only with more relationships. Think of LCSH with its various relationships--not just broader term and narrower term (parent/child) but also related term, and use/use for (older term/newer term).

The article's advice for developing a controlled vocabulary (CV) can be condensed down to the following suggestions:

  1. Define the scope of the CV--how large and complicated does it need to be, and what does it actually need to encompass?
  2. Find good sources for vocabulary--representative content, subject matter experts, search logs (what search terms do your users consistently use?), and existing taxonomies. Consider simply licensing an existing taxonomy if it satisfies your needs.
  3. Have a plan for keeping it updated. Things change--new technologies appear, and terminology changes over time.
  4. Gather terms using your subject matter experts and/or representative documents. Organize them into broad categories including parent/child, related term, and preferred/non-preferred terms. Use dedicated software to manage terms. Creating a graphic representation of the taxonomy may assist with review and categorization.
  5. Export the terms into a machine-readable language for better machine interpretation on the web.
  6. Review and validate the final product, and make sure to incorporate review and validation into the maintenance plan.
  7. Post the new CV to a registry or data warehouse where others can make use of it.

2 comments:

  1. We would get along great in a library. Going back to middle school I was always told when outlining topic there always had to be at least subtopics. Having only one was not allowed, this followed with me and when attempting to clean up the catalog for the implementation of a discovery layer, I found lots of item types that were empty or only had one item in them. It was working through those issues that I read Everything is Miscellaneous (before I started at SLIS) and things started to become clear. If every data set shared the same controlled vocabulary linked data and searches would be easier however regional individuality would be lost. The hope is that all the terms could be linked to the same authority file and then it would all come together.

    ReplyDelete
  2. Good post!

    I would remove just one word, and that word is "final" (as in "final product"). A CV should never be portrayed as ever being final as they are hungry animals that require constant care and feeding forever and ever ... kinda like a puppy who has puppies etc etc etc :)

    ReplyDelete