Collaborative Knowledge
Integration: Enabling Semantic Exchange of Research
Life
sciences R&D is collaborative and information intensive,
and its primary goal is the creation of valuable knowledge
assets. Realizing value from a project almost invariably
requires recording the work as well as some form of communication
or exchange of information products associated with the
research. Researchers typically report certain information
about experiments - aims, background information, methods,
samples, results, data analysis, discussion, and conclusions.
The collection of information that constitutes a record
of a complete experiment often includes content in paper
lab notebooks, files and documents stored on personal and
shared computers, and data managed in LIMS, document management,
and database systems. Significant advances have been made
in systems that support access, integration, and mining
of these data.
Complete records for an experiment generally extend beyond
a single data set or format to include documents, diverse
scientific data types, analyses, annotations, and reports,
plus metadata that relate authorship, date stamps, digital
signatures, and other information. Although certain scientific
data components of an experiment may now be described using
informatics standards, there is no public standard enabling
description, exchange, and reuse of a complete experiment.
In fact, the only method for sharing an entire experimental
record may be to send an email consisting of numerous file
attachments, plus text to relate the contents. A scientist
may readily interpret this information; however, because
it lacks the 'machine-readability' needed for the semantic
exchange and software interoperability of research information,
the content is not suitable to enable integration in collaborative
knowledge networks and diverse computing infrastructures.
Thus, a lack of information standards for sharing research
records is a significant barrier to generating and managing
knowledge assets.
There now exists both a significant community need and an
opportunity to develop an information encoding framework
for the semantic capture, management, exchange, and reuse
of records of experimental research. The goal is a public
domain object model that provides a self-describing, layered,
ontological basis for encoding the contents of an experiment
syntactically, structurally, and semantically. I will review
the components of a complete experiment record that must
be captured for machine-readable exchange using a minimal
Portable Experiment Format. I will then discuss requirements
and strategies for developing a public domain Interoperable
and Extensible Research
Exchange (INTER-XTM)
Framework to support scientific collaboration. Achievement
of these objectives relies heavily upon use of technologies
and concepts that are now available. These include IT infrastructure
(XML, RDF, OWL), general standards and representations for
biomedical research content (e.g. STMML, LSID, and standards
from NCI and NCBI), and domain specific life sciences data
standards, ontologies, and controlled vocabularies (e.g.
CML, BSML, GO, MAGE, BioPax, SBML, HapMap, nci Ontology,
SNOMED, HL7, and many others).