For the purposes of setting up this open notebook, I have an assigned category (data science).
The category collects all of my unique blog entries into one collection:
To organize information further, wordpress allows for the use of tags. Tags are separated with commas.
Unfortunately, I do not have a controlled vocabulary from which to work.
As a student of information science, obviously I am highly interested in controlled vocabularies. In fact I am very interested in Automated Metadata Annotation – as is DataONE in general.
A project that has caught my attention and might be potentially useful is <https://www.nescent.org/sites/hive/Main_Page>
You may notice nescent in the URL – Nescent is the National Evolutionary Synthesis Center. The Hive Project is also funded by the Institute for Museum and Library Science (IMLS) and has some involvement with the UNC iSchool.
Helping Interdisciplinary Vocabulary Engineering (HIVE) is an IMLS funded project involving the Metadata Research Center (MRC) at the School of Information and Library Science, University of North Carolina at Chapel Hill, and the National Evolutionary Synthesis Center (NESCent) in Durham, North Carolina. The two and a half year project is demonstrating the HIVE model for dynamically integrating multiple controlled vocabularies. A recent extension includes HIVE-ES (España) HIVE in Spanish.
HIVE is an automatic metadata generation approach that dynamically integrates discipline-specific controlled vocabularies encoded with the Simple Knowledge Organisation System (SKOS), a World Wide Web Consortium (W3C) standard. HIVE will assist content creators and information professionals with subject cataloging and will provide a solution to the traditional controlled vocabulary problems of cost, interoperability, and usability.
What I like about HIVE is it allows me to search across vocabularies, including the controlled vocabulary formerly established by the NBII Program and Cambridge Scientific Abstracts and now called the USGS Biocomplexity Thesaurus.
What’s neat about it is I can copy and paste my text from this blog into a word document, then run it through HIVE using a controlled vocabulary of my choosing, and it gives me some keywords, including “data,” “metadata,” and “vocabulary.” I’ll use “metadata” and “vocabulary” on this post, starting to build my own controlled vocabulary.
Figure 1. Selecting available thesauri, and choosing text input (in this case, the URL to this blog, albeit prior to adding results of this discussion).
Fig. 2. Output from Keyphrase Extraction Algorithm Tool.
Try it out at http://hive.nescent.org/indexing.html.
Would be nice if this were all automated, but I’ll take what I can get.
Obviously there is a bit of humor in seeing the AgroVoc agricultural thesaurus (Agricultural Vocabulary) suggest I’m talking about bee hive management, or the USGS Biocomplexity Thesaurus suggestion that I’m interested in Electric Generators (as opposed to keyphrase generators), however, this blog entry is perhaps not the best text to process for the KEA algorithm to work properly.
For my Masters thesis, I am optimistic about incorporating the HIVE tool into my research into ontologically derived metadata annotation.