Our goal is to create a crowd-sourced, open platform ontology for metadata which could be applicable to just about any field or subject matter (intellectual property in general). The key innovation is that the set of terms and their relationships will evolve to peoples’ needs in a social ecosystem. We will provide a service that (a) allows users to input metadata with all required terms (according to our ontology) and export it in some controlled format (e.g. XML), and (b) to propose new terms when they’re needed. Stable ontological term: design a system for an evolving ontology based on crowd-sourced input.s will emerge in two ways:
Vernacular: users propose new terms and definitions for those terms. These will be discussed and voted on in the stack overflow manner.
Canonical: we should allow an expert to declare a term in the ontology.
As the ontology will be entirely user-driven, there are a number of pathologies that could develop. Redundancy: two terms exist in the ontology with the same semantic meaning. Irrelevance: a particular term is not applicable to the particular subject matter, but is required by the ontology. What is needed is a way of controlling the evolution of the ontology in an automated wa: design a system for an evolving ontology based on crowd-sourced input.y. I propose simple semantics that helps to avoid these issues and maintain a sound ontology as the registry evolves. Part of my approach prescribes how new terms are proposed.
First, here’s the essential specification of what our system should accomplish. From now on, term is meant to describe classes of metadata, e.g. Type, Format, Creator, etc. Instance will be used for particular values described by a term—e.g., JPEG, plain_text, or PDF are instances of Format. We want a function, call it ExportMetadata() returns a set of only those (term, instance) pairs that pertain to the subject matter of the users data, according to our ontology. Part of this interface is a way to propose new terms in a systematic way.
A stated ambition of this project is to use natural language processing techniques to analyze definitions of proposed terms. The proposed ontology semantics allows to resolve the relationships between terms in order to determine computationally if there is a collision. Of course, I don’t think there’s anyway to do this without human input, but it provides a way to flag obvious redundancies. There are other ways in which we want to utilize NLP, but we still need to go through these cases.
In my opinion, the proposed design is a feasible nine-week project and will enable the evolution of a sound, complete ontology. The metadata work group is meeting in Chicago next week to finalize system architecture, front-end design, and the use cases we wish to cover. I’ll add his notes to the design and post an update as soon as they’re available.
Attached is the full document I submitted to my mentor, John. I’ll add his notes to the design and post an update.