PREreview of A maturity model for catalogues of semantic artefacts
- CC BY 4.0
The authors provide a definition for ‘semantic artefact catalogue (SAC)’ and suggest a maturity model for evaluating SACs that is based on a set of criteria with multiple sub-criteria. Their approach is aligned with the FAIR Guiding Principles and the ongoing efforts of the EOSC Task Force of Semantic Interoperability to address interoperability challenges. The authors apply their maturity model to a set of 26 SACs and evaluate them.
I enjoyed reading the paper and think that the maturity model represents an important contribution to the overall goal of increasing the semantic interoperability of (meta)data. I also think that the criteria (i.e., dimensions) and sub-criteria are well chosen and provide a good overview for a given SAC. However, I see also some points for potential improvement:
1) I think, the paper would benefit from a clear definition of ‘semantic artefact’. In the introduction, the authors characterize ‘semantic artefact as “a machine-actionable and machine-readable formalization of a conceptualization, enabling sharing and reuse by humans and machines, that may have a broad range of formalisations, from loose sets of terms, taxonomies, thesauri to higher-order logic constructs, vocabularies and ontologies”. I agree that vocabularies/terminologies of the various sorts mentioned here represent very important semantic artefacts. However, I am missing (meta)data schemata and formats, term-mappings, and schema crosswalks in this list. Clarifying that semantic artefacts also include the latter is very important, because their importance in semantic interoperability is often overlooked, leaving the impression that semantic interoperability is ‘only’ providing controlled vocabularies that must be used by everyone. Semantic interoperability, however, is much more and includes terminological, schematic, logical, and ontological interoperability (https://arxiv.org/abs/2301.04202), each with its own set of requirements.
2) My second point somewhat relates to the first. Reading the paper, I had the impression that only vocabularies/terminologies were considered, and all the other types of semantic artefacts mentioned above were ignored. What is missing in the maturity model is a characterization of the different types of semantic artefacts that a catalogue is covering/documenting. This could be an additional dimension with the different types of artefacts as sub-criteria.
3) Vocabularies/terminologies (including ontologies etc.) only support terminological interoperability, and if different vocabularies/terminologies exist for the same types of entities, term mappings must be provided to guarantee terminological interoperability. Sometimes, those term mappings are provided within the vocabularies/terminologies themselves (e.g., via owl:sameAs), but this is not ideal. Better would be to provide term-mappings as stand-alone semantic artefacts. This should be mentioned in the paper and should be evaluated by the maturity model.
4) (Meta)data schemata provide schematic interoperability, and if different schemata for the same type of information exist, schema crosswalks must be provided for schematic interoperability. Schema crosswalks are another type of semantic artefact that should be mentioned in the paper and should also be evaluated by the maturity model.
5) Logical interoperability requires the use of the same logical framework such as OWL/description logic to guarantee the logical interoperability of (meta)data. Semantic artefacts should specify if and which logical framework they apply. Respective information should be covered by the maturity model.
6) Another candidate catalogue for evaluation: TIB Terminology Service (https://service.tib.eu/ts4tib/index). Following the maturity model, Felix Engel has evaluated the TIB Terminology Service (only the ‘checks’ mentioned):
Me (custom vocabulary, primary metadata, human readable, machine readable)
Op (customised oss, open model, open contribution)
Qu (curation by owner only)
Av (no restrictions)
St (catalog statistics, resource statistics)
Go (3rd party, description, rules)
Co (read only)
Su (organization, management board, (research) projects)
Te (REST API, web search GUI)
Tr (documented curation, automatic curation)
List of additional comments:
p. 2, last paragraph (see also p.4, 2.2, last paragraph): If a semantic artefact is machine-actionable it must also be machine-readable. The way it is written leaves the impression that machine-actionability and machine-readability would be independent criteria, which is not the case.
p. 4, last paragraph (spelling): in the context of the H2020
p. 8, 4 Results: I would not characterize BioPortal or EBI OLS as ‘metadata catalogues’.
p. 10, Technology (d): “alignment – a service to align (part of) semantic artefacts that might be used within a catalog.” It is not entirely clear what that means.
p. 14, 2nd paragraph (spelling): In the future, we aim at
p. 14, 2nd paragraph (spelling): Similarly, the work under development in
I am a member of the EOSC Task Force Semantic Interoperability but I was neither involved in writing the paper nor in discussing its topics.