Recent Papers

Dedicated to abstracts of recent papers that pertain to LinkedMusic's ideals


Declerck, Thierry, Thomas Troelsgård, and Sussi Olsen. 2023. “Towards an RDF Representation of the Infrastructure Consisting in Using Wordnets as a Conceptual Interlingua between Multilingual Sign Language Datasets.” In Proceedings of GWC2023, 8. Donostia / San Sebastian, Basque Country.

The authors present ongoing work dealing with a Linked Data compliant representation of infrastructures using wordnets for connecting multilingual Sign Language data sets. They built on already existing RDF and OntoLex representations of Open Multilingual Wordnet (OMW) data sets and work done by the European EASIER research project on the use of the CSV files of OMW for linking glosses and basic semantic information associated with Sign Language data sets in two languages: German and Greek. In this context, they started the transformation into RDF of a Danish data set, which links Danish Sign Language data and the wordnet for Danish, DanNet. The final objective of their work is to include Sign Language data sets (and their conceptual cross-linking via wordnets) in the Linguistic Linked Open Data cloud.


Ansovini, Daniela, Kelli Babcock, Tanis Franco, Jiyun Alex Jung, Karen Suurtamm, and Alexandra Wong. 2022. “Knowledge Lost, Knowledge Gained: The Implications of Migrating to Online Archival Descriptive Systems.” KULA: Knowledge Creation, Dissemination, and Preservation Studies 6 (3):1–19.

Migrating archival description from paper-based finding aids to structured online data reconfigures the dynamics of archival representation and interactions. This paper considers the knowledge implications of transferring traditional finding aids to Discover Archives, a university-wide implementation of Access to Memory (AtoM) at the University of Toronto. The migration and translation of varied descriptive practices to conform to a single system that is accessible to anyone, anywhere, effectively shifts both where and how users interface with archives and their material. This paper reflects on how different sets of knowledge are reorganized in these shifts. The writers explore the extent to which that lost knowledge can be drawn back into archival interactions via rich metadata that documents contexts and relationships embedded within Discover Archives and beyond. Internal user experience design (UXD) research on Discover Archives highlights a gap between current online descriptions and habitual user expectations in web search and discovery. To help bridge this gap, we contributed to broader discovery nodes such as linked open “context hubs” like Wikipedia and Wikidata, which can supplement hierarchical description with linked metadata and visualization capabilities. These can reintroduce rhizomatic and serendipitous connections, enabled by archivists, researchers, and larger sets of community knowledge, to the benefit of both the user and the archivist.

Arnold, Matthias. 2022. “Multilingual Research Projects: Non-Latin Script Challenges for Making Use of Standards, Authority Files, and Character Recognition.” Digital Studies / Le Champ Numérique 12 (1):36.

Academic research about digital non-Latin script (hereafter: NLS) research data can pose a number of challenges just because the material is from a region where the Latin alphabet was not used. Not all of them are easy to spot. In this paper, the author introduces two use cases to demonstrate different aspects of the complex tasks that may be related to NLS material. The first use case focuses on metadata standards used to describe NLS material. Taking the VRA Core 4 XML as example, they will show where they found limitations for NLS material and how they were able to overcome them by expanding the standard. In the second use case, they look at the research data itself. Although the full-text digitization of western newspapers from the 20th century usually is not problematic anymore, this is not the case for Chinese newspapers from the Republican era (1912–1949). A major obstacle here is the dense and complex layout of the pages, which prevents OCR solutions from getting to the character recognition part. In the authors approach, they are combining different manual and computational methods like crowdsourcing, pattern recognition, and neural networks to be able to process the material in a more efficient way. The two use cases illustrate that data standards or processing methods that are established and stable for Latin script material may not always be easily adopted to non-Latin script research data.

Ayala, Brenda Reyes, Qiufeng Du, and Juyi Han. 2022. “Detecting Content Drift on the Web Using Web Archives and Textual Similarity.” In Proceedings of the Workshops and Doctoral Consortium of the 26th International Conference on Theory and Practice of Digital Libraries, 9. Padua, Italy: CEUR Workshop Proceedings.

Content drift, which occurs when a website’s content changes and moves away from the content it originally referenced, is problem that affects both live websites and web archives. Content drift can also occur when the page has been hacked, its domain has expired, or the service has been discontinued. In this paper, the authors present a simple method for detecting content drift on the live web based on comparing the titles of live websites to those of their archived versions. Their assumption was that the higher the difference between the title of an archived website and that of its live counterpart, the more likely content drift had taken place. In order to test their approach, they first had human evaluators manually judge websites from three Canadian web archives to determine or not content drift had occurred. Then they extracted the titles from all websites, and used cosine similarity to compare the title of the live websites to the title of the archived websites. The approach achieved positive results, with an accuracy of 85.2, precision of 89.3, recall of 92.1, and F-measure values of 90.7. Having simple methods such as the one presented in this paper can allow institutions or researchers to quickly and effectively detect content drift without needing many technological resources.

Bigelow, Ian. 2022. “Conducting the Opera: The Evolution of the RDA Work to the Share-VDE Opus and BIBFRAME Hub.” In Linking Theory and Practice of Digital Libraries, 335–50. Padua, Italy: Springer.

This paper examines recent developments in the use of Resource Description and Access Work, BIBFRAME Hub, and Share-VDE Opus (referred to collectively as Opera) in bibliographic description. These Opera will be discussed to capture the current state of developments in this area, but also make recommendations for a path forward as we reach a confluence with these parallel developments. With a new version of the Official RDA, and many libraries working towards BIBFRAME implementation this is a good point of reflection, where scientific analysis of conceptions of Opera have been further developed through initiatives at the Library of Congress (LC), and Share-VDE. Past discussions comparing the RDA and BIBFRAME models have focused on compatibility issues, but this paper will attempt at framing the differences as part of a developmental trajectory.

Biswas, Russa, Yiyi Chen, Heiko Paulheim, Harald Sack, and Mehwish Alam. 2022. “It’s All in the Name: Entity Typing Using Multilingual Language Models.” In Proceedings of The Semantic Web: ESWC 2022 Satellite Events, 13384:59–64. Lecture Notes in Computer Science. Hersonlssos, Crete, Greece: Springer International Publishing.

The entity-type information in Knowledge Graphs (KGs) of different languages plays an important role in a wide range of Natural Language Processing applications. However, the entity types in KGs are often incomplete. Multilingual entity typing is a non-trivial task if enough information is not available for the entities in a KG. In this work, multilingual neural language models are exploited to predict the type of an entity from only the name of the entity. The model has been successfully evaluated on multilingual datasets extracted from different language chapters in DBpedia namely German, French, Spanish, and Dutch.

Bulla, Luana, Maria Chiara Frangipane, Maria Letizia Mancinelli, Ludovica Marinucci, Misael Mongiovi, Margherita Porena, Valentina Presutti, and Chiara Veninata. 2022. “Developing and Aligning a Detailed Controlled Vocabulary for Artwork.” In New Trends in Database and Information Systems, 537–48. SWODCH: 2nd Workshop on Semantic Web and Ontology Design for Cultural Heritage. Turin, Italy: Springer.

Controlled vocabularies have proved to be critical for data interoperability and accessibility. In the cultural heritage (CH) domain, description of artworks are often given as free text, thus making filtering and searching burdensome (e.g. listing all artworks of a specific type). Despite being multi-language and quite detailed, the Getty’s Art & Architecture Thesaurus –a de facto standard for describing artworks–has a low coverage for languages different than English and sometimes does not reach the required degree of granularity to describe specific niche artworks. We build upon the Italian Vocabulary of Artworks, developed by the Italian Ministry of Cultural Heritage (MIC) and a set of free text descriptions from ArCO, the knowledge graph of the Italian CH, to propose an extension of the Vocabulary of Artworks and align it to the Getty’s thesaurus. Our framework relies on text matching and natural language processing tools for suggesting candidate alignments between free text and terms and between cross vocabulary terms, with a human in the loop for validation and refinement. We produce 1.166 new terms (31% more w.r.t. the original vocabulary) and 1.330 links to the Getty’s thesaurus, with estimated coverage of 21%.

Canning, Erin, Susan Brown, Sarah Roger, and Kimberley Martin. 2022. “The Power to Structure: Making Meaning from Metadata Through Ontologies.” KULA: Knowledge Creation, Dissemination, and Preservation Studies, Metadata as Knowledge, 6 (3):1–15.

The Linked Infrastructure for Networked Cultural Scholarship project (LINCS) helps humanities researchers tell stories by using linked open data to convert humanities datasets into organized, interconnected, machine-processable resources. LINCS provides context for online cultural materials, interlinks them, and grounds them in sources to improve web resources for research. This article describes how the LINCS team is using the shared standards of linked data and especially ontologies to bring meaning mindfully to metadata through structure. The LINCS metadata—comprised of linked open data about cultural artifacts, people, and processes—and the structures that support it must represent multiple, diverse ways of knowing. It needs to enable various means of incorporating contextual data and of telling stories with nuance and context, situated and supported by data structures that reflect and make space for specificities and complexities. As it addresses specificity in each research dataset, LINCS is simultaneously working to balance interoperability, as achieved through a level of generalization, with contextual and domain-specific requirements. The LINCS team’s approach to ontology adoption and use centers on intersectionality, multiplicity, and difference. The question of what meaning the structures being used will bring to the data is as important as what meaning is introduced as a result of linking data together, and the project has built this premise into its decision-making and implementation processes. To convey an understanding of categories and classification as contextually embedded—culturally produced, intersecting, and discursive—the LINCS team frames them not as fixed but as grounds for investigation and starting points for understanding. Metadata structures are as important as vocabularies for producing such meaning.

Coladangelo, L P, and Lynn Ransom. 2021. “Semantic Enrichment of the Schoenberg Database of Manuscripts Name Authority through Wikidata.” In Metadata and Semantic Research, 1537:5. Virtual: Springer.

This case study explored the semantic enrichment of name authority data from the Schoenberg Database of Manuscripts, a database of manuscript provenance data. Informed by previous linked data and semantic enrichment research, this study utilized a test dataset of approximately 12,500 named entities to align and link to corresponding Wikidata items. Working with the Wikidata community on data property creation and using OpenRefine for reconciliation and batch editing, approximately 9,000 SDBM names were linked to Wikidata pages. The resulting linked dataset was tested using a series of data- and research-related SPARQL queries of interest to manuscript scholars and Schoenberg Institute staff. All but one of ten exploratory questions were answered satisfactorily by the results of the SPARQL test queries. Future research will focus on expanding the number of SDBM name authority entities linked to Wikidata as well as using Wikidata as a linked data repository for other manuscriptrelated metadata projects.

Cui, Wen, Leanne Rolston, Marilyn Walker, and Beth Ann Hockey. 2022. “OpenEL: An Annotated Corpus for Entity Linking and Discourse in Open Domain Dialogue.” In Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), 12. Marseille, France: European Language Resources Association (ELRA).

Entity linking in dialogue is the task of mapping entity mentions in utterances to a target knowledge base. Prior work on entity linking has mainly focused on well-written articles such as Wikipedia, annotated newswire, or domain-specific datasets. The writers extend the study of entity linking to open domain dialogue by presenting the OPENEL corpus: an annotated multi-domain corpus for linking entities in natural conversation to Wikidata. Each dialogic utterance, in 179 dialogues over 12 topics from the original EDINA corpus, has been annotated for entities realized by definite referring expressions as well as anaphoric forms such as he, she, it, and they. OPENEL thus supports training and evaluation of entity linking in open-domain dialogue, as well as analysis of the effect of using dialogue context and anaphora resolution in model training. It can also be used for fine-tuning a coreference resolution algorithm. They also establish baselines for named entity linking in open domain conversation using several existing entity linking systems. The results demonstrate the remaining performance gap between the baselines and human performance, highlighting the challenges of entity linking in open-domain dialogue, and suggesting many avenues for future research using OPENEL.

Deng, Sai, Greta Heng, Amanda Xu, Lihong Zhu, and Xiaoli Li. 2022. “Enhance the Discovery and Interoperability of Culturally Rich Information: The Chinese Women Poets WikiProject.” Poster presented at the The Chinese American Librarians Association (CALA) Annual Conference 2022, Washington DC, United States, June 25.

A group of Chinese American librarians from several institutions formed a WikiProject team: Chinese Culture and Heritage group in 2020 to study Wikidata. The group hoped to explore the potential of Wikidata, contribute to the diversity of data in Wikidata which has been increasingly utilized in libraries’ discovery systems, and seek collaboration opportunities. Its primary focus is to create and enhance Wikidata items that showcase Chinese culture and heritage information. This poster will present an overview of the Chinese Women Poets Wikiproject, the first project the group has embarked on, that uses OpenRefine and PyWikibot to enhance over 4,000 Chinese women poets’ names in Wikidata. In addition, the presenters will discuss the challenges and the benefits of the project as well as their future work.

Dias, Mariana, and Carla Teixeira Lopes. 2022. “Mining Typewritten Digital Representations to Support Archival Description.” In Proceedings of the Workshops and Doctoral Consortium of the 26th International Conference on Theory and Practice of Digital Libraries, 6. Padua, Italy: CEUR Workshop Proceedings.

Linked Data is used in various fields as a new way of structuring and connecting data. Cultural heritage institutions have been using linked data to improve archival descriptions and promote findability. Given this, in EPISA, a research project on this topic, the authors propose to use the contents of the digital representations associated with the objects to assist archivists in their description tasks. More specifically, to extract information from the digital representations useful for an initial ontology population that should be validated or edited by the archivist. They apply optical character recognition in an initial stage to convert the digital representation to a machine-readable format. They then use ontology-oriented programming to identify and instantiate ontology concepts using neural networks and contextual embeddings.

Dobriy, Daniil, and Axel Polleres. 2022. “Analysing and Promoting Ontology Interoperability in Wikibase.” In Wikidata ’22: Wikidata Workshop at ISWC 2022, 7. Online.

Wikibase, the open-source software behind Wikidata, increasingly gains popularity among third-party Linked Data publishers. However, the platform’s unique data model decreases the degree of interoperability with existing Semantic Web standards and tools that underlie Linked Data as codified by Linked Data principles. In particular, this unique data model of Wikibase also undermines the direct reuse of ontologies and vocabularies, in a manner compliant with Semantic Web standards and Linked Data principles. To this end, firstly, we compare the Wikibase data model to the established RDF data model. Secondly, we enumerate a series of challenges for importing existing ontologies into Wikibase. Thirdly, we present practical solutions to these challenges and introduce a tool for importing and re-using ontologies within Wikibase. Thus, the paper aims to promote ontology interoperability in Wikibase and by doing so hopes to contribute to higher degree of inter-linkage of Wikibase instances with Linked Open Data.

Doub, Bolton. 2022. “Documenting a Move Using Archival Description: Tools for Bridging the Gaps Between Physical and Intellectual Control.” Journal of Western Archives 13 (1):19.

Following the move of approximately 40,000 linear feet of archival material between offsite storage facilities, the University of Southern California (USC) Libraries began a project to document these holdings’ new locations using ArchivesSpace. This case study explores a combination of tools–including the ArchivesSpace API, Python scripts, OpenRefine, and spreadsheet applications–that the USC Libraries used to batch-edit and create container data in ArchivesSpace following the move. The paper discusses the challenges and shortcomings of these tools for editing particular forms of legacy data entered into USC’s instance of ArchivesSpace long before the move. When the creators of this past description prioritized the work of establishing intellectual control (describing the informational content of archival resources) using methods that neglected descriptive prerequisites for the future maintenance of physical control (tracking the physical locations of archival holdings), the tools outlined in this paper were less effective in editing that legacy data.

Fell, Michael, Elena Cabrio, Maroua Tikat, Franck Michel, Michel Buffa, and Fabien Gandon. 2022. “The WASABI Song Corpus and Knowledge Graph for Music Lyrics Analysis.” Language Resources and Evaluation, July.

The authors present the WASABI Song Corpus, a large corpus of songs enriched with meta‑data extracted from music databases on the Web, and resulting from the processing of song lyrics and from audio analysis. More specifically, given that lyrics encode an important part of the semantics of a song, the paper focuses on the description of the methods they proposed to extract relevant information from the lyrics, such as their structure segmentation, their topics, the explicitness of the lyrics content, the salient passages of a song and the emotions conveyed. The corpus contains 1.73M songs with lyrics (1.41M unique lyrics) annotated at different levels with the out‑put of the above mentioned methods. The corpus labels and the provided methods can be exploited by music search engines and music professionals (e.g. journalists, radio presenters) to better handle large collections of lyrics, allowing an intelligent browsing, categorization and recommendation of songs. They demonstrate the utility and versatility of the WASABI Song Corpus in three concrete application scenarios. Together with the work on the corpus, they present the work achieved to transition the dataset into a knowledge graph, the WASABI RDF Knowledge Graph, and show how this will enable an even richer set of applications.

Gayo, Jose Emilio Labra. 2022. “WShEx: A Language to Describe and Validate Wikibase Entities.” ArXiv:2208.02697, Computer Science, , August, 12.

Wikidata is one of the most successful Semantic Web projects. Its underlying Wikibase data model departs from RDF with the inclusion of several features like qualifiers and references, built-in datatypes, etc. Those features are serialized to RDF for content negotiation, RDF dumps, and in the SPARQL endpoint. Wikidata adopted the entity schemas namespace using the ShEx language to describe and validate the RDF serialization of Wikidata entities. In this paper, the writers propose WShEx, a language inspired by ShEx that directly supports the Wikibase data model and can be used to describe and validate Wikibase entities. The paper presents the abstract syntax and semantics of the WShEx language.

Georgiadis, Haris, Agathi Papanoti, Elena Lagoudi, Georgia Angelaki, Nikos Vasilogamvrakis, Alexia Panagopoulou, and Evi Sachini. 2022. “Enriching the Greek National Cultural Aggregator with Key Figures in Greek History and Culture: Challenges, Methodology, Tools and Outputs.” In Linking Theory and Practice of Digital Libraries. Vol. 13541. Lecture Notes in Computer Science. Padua, Italy: Springer International Publishing.

Since 2015,, the Greek cross-domain Cultural Data Aggregator, a service developed by the National Documentation Centre in Greece (EKT), has collected a growing number of 800.000 digitised Cultural Heritage Objects (CHOs) from 73 cultural institutions. Addressing metadata heterogeneity in order to be able to provide advanced search, browsing and filtering options to users has been a key target from the start. Controlled linked data vocabularies for item types, historical periods and themes were developed over the course of the past years and are being used for the semantic enrichment of the CHOs’ metadata. In the current paper the authors present the challenges, the methodology and tools used over the past 2 years for the process of enriching the aggregated CHOs’ metadata with person entities from a Linked Data vocabulary comprising over 8.200 entries concerning Greek persons that left some mark in history, society, science, letters and art that we created for that purpose. This latest development allows all the works relating to a person to be interlinked and semantically enriched, adds significant browse and search functionalities to the portal and, therefore, opens new horizons for Greek SSH research.

Gillis-Webber, Frances, and C. Maria Keet. 2022. “A Review of Multilingualism in and for Ontologies.” ArXiv:2210.02807, Artificial Intelligence, , October, 22.

The Multilingual Semantic Web has been in focus for over a decade. Multilingualism in Linked Data and RDF has shown substantial adoption, but this is unclear for ontologies since the last review 15 years ago. One of the design goals for OWL was internationalisation, with the aim that an ontology is usable across languages and cultures. Much research to improve on multilingual ontologies has taken place in the meantime, and presumably multilingual linked data could use multilingual ontologies. Therefore, this review seeks to (i) elucidate and compare the modelling options for multilingual ontologies, (ii) examine extant ontologies for their multilingualism, and (iii) evaluate ontology editors for their ability to manage a multilingual ontology.

Green, Alex, and Dr. K Faith Lawrence. 2022. “The Shock of the New: Testing the Pan-Archival Linked Data Catalogue with Users.” In Proceedings of the Workshops and Doctoral Consortium of the 26th International Conference on Theory and Practice of Digital Libraries, 7. Padua, Italy: CEUR Workshop Proceedings.

The UK National Archives’ goal is to re-imagine archival practice, pioneer new approaches to description and build a new linked data catalogue. The Pan-Archival Catalogue will bring together into one management system descriptions of both physical and digital records from a variety of sources within the organization. This report briefly describes the users’ feedback on aspects of the new data model when first shown in the new editorial interface and as part of business pro- cesses.

Guzman, Allyssa, Albert A Palacios, and Ryan Sullivant. 2022. “White Paper for Enabling and Reusing Multilingual Citizen Contributions in the Archival Record.”

This project arose from the growing consensus that representation has not been enough to diversify the digital cultural record. Rather, as digital humanists, archivists, and librarians have pointed out, representation without the participation of non-Anglophone and minority groups has recreated historical exclusions, which now stem from a lack of technological or multilingual resources that facilitate access and engagement with materials in other languages (Priani Saisó et al. 6; Caswell et al.; Bow and Hepworth; Risam). The message in these critiques is resounding: the field needs to promote and support cultural and linguistic diversity. Part 1 of this project provided an avenue for non-English literate communities to meaningfully engage and contribute to the Digital Humanities through the interface internationalization and translation of an open source digital scholarship platform–FromThePage. Part 2 of this project enhanced FromThePage’s collection management capabilities and exports to facilitate the development of workflows for preserving and reusing collaborative scholarship.

Haller, Armin, Axel Polleres, Daniil Dobriy, Nicolas Ferranti, and Sergio J. Rodriguez Mendez. 2022. “An Analysis of Links in Wikidata.” In ESWC 2022: The Semantic Web, 21–38. Crete, Greece: Springer.

Wikidata has become one of the most prominent open knowledge graphs (KGs) on the Web. Relying on a community of users with different expertise, this cross-domain KG is directly related to other data sources. This paper investigates how Wikidata is linked to other data sources in the Linked Data ecosystem. To this end, the authors adapt previous definitions of ontology links and instance links to the terminological part of the Wikidata vocabulary and perform an analysis of the links in Wikidata to external datasets and ontologies from the Linked Data ecosystem. As a side effect, this reveals insights on the ontological expressiveness of meta-properties used in Wikidata. The results of this analysis show that while Wikidata defines a large number of individuals, classes and properties within its own namespace, they are not (yet) extensively linked. They discuss reasons for this and conclude with some suggestions to increase the interconnectedness of Wikidata with other KGs.

Han, Sooyeon, and JongGyu Han. 2022. “Case Study on an Integrated Interoperable Metadata Model for Geoscience Information Resources.” Geoscience Data Journal, 16.

The article covers the creation of a metadata schema to promote interoperability and characterize historically collected geoscience data. While not in the same field, the article shows the steps taken to develop a switching-across methodology, as well as an example of the methodology.

Hansson, Karin, and Anna Näslund Dahlgren. 2022. “Choice, Negotiation, and Pluralism: A Conceptual Framework for Participatory Technologies in Museum Collections.” Computer Supported Cooperative Work (CSCW) 31 (4):603–31.

In an era of big data and fake news, museums’ collection practices are particularly important democratic cornerstones. Participatory technologies such as crowdsourcing or wikis have been put forward as a means to make museum collections more open and searchable, motivated by a desire for efficiency but also as a way to engage the public in the development of a more diverse and polyphonic heritage. However, there is a lack of a nuanced vocabulary to describe participatory technologies in terms of democracy. Without a deeper understanding of how technology shapes the overall structures, there is a risk that the tools instead undermine democratic ambitions.

Huang, Zhaoyan, and Tao Xu. 2022. “Research on Knowledge Management of Intangible Cultural Heritage Based on Linked Data.” Edited by R. Mo. Mobile Information Systems 2022 (August):1–14.

At present, the protection of intangible cultural heritage has received more and more attention from all levels of society. Intangible cultural heritage is a treasure of national culture. It is an indispensable part of Chinese civilization, the crystallization of the wisdom of Chinese civilization, and represents the country’s soft power. Ontology and linked data technology provide a new method and realization path for the organization and management of intangible cultural heritage knowledge. In this paper, the intangible cultural heritage knowledge is organized reasonably semantically based on the method of linked data, and the purpose is to use the structure of linked data to express the resource data of different structures in a structured manner. This paper first introduces the meaning and background of the research and analyzes the relevant research at home and abroad. Second, it introduces the related knowledge of linked data, analyzes and sorts out the elements and semantic relationship of knowledge in the field of intangible cultural heritage, and designs and constructs the ontology model of intangible cultural heritage knowledge, Finally, based on linked data technology, the process of intangible cultural heritage knowledge organization and linked data set construction is studied, including key steps such as entity to RDF, entity association, linked data storage, and publication. The application of linked data technology in the field of intangible cultural heritage knowledge organization and management can promote the standardization and standardization of intangible cultural heritage knowledge management and is of great significance to the protection and inheritance of my country’s intangible cultural heritage culture.

Kesäniemi, Joonas, Mikko Koho, and Eero Hyvönen. 2022. “Using Wikibase for Managing Cultural Heritage Linked Open Data Based on CIDOC CRM.” In New Trends in Database and Information Systems, 550–57. SWODCH: 2nd Workshop on Semantic Web and Ontology Design for Cultural Heritage. Turin, Italy: Springer.

This paper addresses the problem of maintaining CIDOC CRM-based knowledge graph (KG) by non-expert users. We present a practical method using Wikibase and specific data input conventions for creating and editing linked data that can be exported as CIDOC CRM compliant RDF. Wikibase is a proven and maintained software for generic KG maintenance with a fixed but flexible data model and easy-to-use user interface. It runs the collaboratively edited Wikidata KG, as well as increasing amount of domain specific services. The proposed solution introduces a set of data input conventions for Wikibase that can be used to generate CIDOC CRM compliant RDF without programming. The process relies on the aforementioned data input rules combined with generic mapping implementations and metadata stored as part of the KG. We argue that this convention over coding makes the system more easily approachable and maintainable for users that want to adhere to the CIDOC CRM principles, but are not ontology experts. As part of the preliminary evaluation of the proposed solution, an example on managing Cultural Heritage data in the military history domain with discussion on the limitations of the approach is presented.

Khan, Huda, Claire DeMarco, Christine Fernsebner Eslao, Steven Folsom, Jason Kovari, Simeon Warner, Tim Worrall, and Astrid Usong. 2022. “Using Linked Data Sources to Enhance Catalog Discovery.” KULA: Knowledge Creation, Dissemination, and Preservation Studies, Metadata as Knowledge, 6 (3):1–26.

This article explores how linked data sources and non-library metadata can support the open-ended discovery of library resources. They also consider which experimental methods are best suited to improving library catalog systems. They provide an overview of the questions driving our discovery experiments with linked data, a summary of their usability findings, and their design and implementation approach. In addition, they situate the discussion of their work within the larger framework of library cataloging and curation practices.

Klose, Annamarie C., Scott Goldstein, and Morris S. Levy. 2022. “Numismatics & Bibliographic Description: How Rutgers University Libraries Described Coins with MODS.” Journal of Library Metadata 22 (1–2). Routledge:75–104.

Realia pose challenges when utilizing bibliographic metadata standards. Rutgers University Libraries, in collaboration with Rutgers University’s Classics Department, created a large digital library collection of ancient Roman coins in RUcore, Rutgers University’s Community Repository. RUcore records use Metadata Object Description Standard (MODS) for descriptive metadata and many custom fields. Therefore, it was necessary to adapt numismatic description to fit this structure. During the planning stage of the project, Numismatic Description Standard (NUDS), a numismatic database standard implemented and maintained by the American Numismatic Society (ANS), and VRA Core, an art-centered XML metadata standard created by the Visual Resources Association, provided valuable insights. However, this project faced challenges in terms of interoperability and time constraints that required altering the team’s approach to this unique set of resources in a digital library environment. Key issues were encoding B.C.E. dates in a machine-readable format for optimal searching and browsing, developing local controlled vocabularies, providing subject access to the iconography on coins, and the research-intensive work of metadata description. This article provides “how to” information, as well as a critical analysis of lessons learned and opportunities for improvement as the linked data landscape has changed both bibliographic and numismatic description.

Kumar, T K Gireesh, and Praseetha Gireesh. 2022. “Towards a National Collection: Metadata Aggregation of Digital Cultural Heritage.” In CALIBER 2022: Varanasi, UP, 16. 32. Varanasi, UP, INDIA: INFLIBNET Centre.

In the past few decades communities have embraced digital technologies to gather, record, and organize their collection in a systematic way to enhance its discoverability which empowers diverse audiences worldwide. Galleries, Libraries, Archives, Museums (GLAMs), cultural organizations, and memory institutions go beyond the physical display and engaged in transforming the online exploration of cultural artifacts and historical archives by harnessing innovative technologies. Cultural content in digitized form is an important resource and providing online access to these resources enhances their visibility, which can in turn contribute to the economic growth of the country. The potential of digital technology can also be utilized for bringing multiple cultural heritage collections together for better visualization and accessibility. However, standards-based and compatible technologies for the digitization process are essential for their uniformity and interoperability so that they can eliminate the challenges associated with discoverability and usage of resources by potential audiences. Cultural heritage aggregators are platforms that can address the difficulties of discoverability in cultural heritage resources by enabling search across multiple cultural heritage collections and improving access to the contents. This study gives an overview of the various Indian initiatives to safeguard the cultural heritage assets of the country, and its major digital initiatives in this direction, and highlights the need for an Indian aggregator platform for a national collection of cultural heritage.

Kuys, Gerard, and Ansgar Scherp. 2022. “Representing Persons and Objects in Complex Historical Events Using the Event Model F.” Journal of Open Humanities Data 8 (0). Ubiquity Press:22.

The digital representation and publishing of human history on the web has so far been stuck at the digital unlocking of collections of historical items. Those collections are described by metadata mostly for curation and for findability metadata that are put on the web in the manner of a library catalog. Usually, there is little on ‘aboutness’. This is unfortunate, as modern representations of knowledge and web-based interactive presentation techniques offer ample opportunities for a more complex representation and richer interaction with digital history. In this contribution, we argue for a history in digital data that is treated for what it is: an interpretation, like all history, while remaining traceable to its information carriers.

Lee, Mihwa. 2022. “BIBFRAME’s Application Method for Reflecting LRM in Linked Data.” International Federation of Library Associations and Institutions (IFLA).

Library Reference Model (replacing FRBR) is a new conceptual model for constructing linked data in library community BIBFRAME as encoding format should be revised in order to reflect LRM This research is to propose the BIBFRAME s application profile practically mapping to reflect representative expression attributes and manifestation statement of LRM ‘s attributes.

Lisena, Pasquale, Albert Meroño-Peñuela, and Raphaël Troncy. 2022. “MIDI2vec: Learning MIDI Embeddings for Reliable Prediction of Symbolic Music Metadata.” Edited by Mehwish Alam, Davide Buscaldi, Michael Cochez, Francesco Osborne, Diego Reforgiato Recupero, Harald Sack, Mehwish Alam, et al. Semantic Web 13 (3):357–77.

An important problem in large symbolic music collections is the low availability of high-quality metadata, which is essential for various information retrieval tasks. In this work, the writers propose MIDI2vec, a new approach for representing MIDI files as vectors based on graph embedding techniques. Their strategy consists of representing the MIDI data as a graph, including the information about tempo, time signature, programs, and notes. Next, they run and optimize node2vec for generating embeddings using random walks in the graph. They demonstrate that the resulting vectors can successfully be employed for predicting the musical genre and other metadata such as the composer, the instrument, or the movement. Their proposal has real-world applications in automated metadata tagging for symbolic music, for example in digital libraries for musicology, datasets for machine learning, and knowledge graph completion.

Loukachevitch, Natalia, Pavel Braslavski, Vladimir Ivanov, Tatiana Batura, Suresh Manandhar, Artem Shelmanov, and Elena Tutubalina. 2022. “Entity Linking over Nested Named Entities for Russian.” In Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), 4458–66. Marseille, France: European Language Resources Association (ELRA.

In this paper, the writers describe entity linking annotation over nested named entities in the recently released Russian NEREL dataset for information extraction. The NEREL collection (Loukachevitch et al., 2021) is currently the largest Russian dataset annotated with entities and relations. The paper describes the main design principles behind NEREL’s entity linking annotation, provides its statistics, and reports evaluation results for several entity linking baselines. To date, 38,152 entity mentions in 933 documents are linked to Wikidata. The NEREL dataset is publicly available:

Malin, Yonatan, Christina Crowder, Clara Byom, and Daniel Shanahan. 2022. “Community Based Music Information Retrieval: A Case Study of Digitizing Historical Klezmer Manuscripts from Kyiv.” Transactions of the International Society for Music Information Retrieval 5 (1). Ubiquity Press:208–21.

In this article the authors provide a case study in the datafication of historical handwritten manuscripts, which diversifies the repertoire, approaches, demographics, and institutional partnerships of MIR. The Kiselgof-Makonovetsky Digital Manuscript Project (KMDMP) is a community-based project to digitize music and text, teach, and make music from facsimiles of manuscripts held by the Vernadsky National Library of Ukraine. The corpus comprises 850 high-resolution photographs of handwritten music manuscripts and catalog pages, with a total of around 1,300 melodies. Much of the music was collected by pioneering Belarusian ethnographer Zusman Kiselgof among Jewish communities in the ‘Pale of Settlement’ (mostly in modern Ukraine and Belarus) during the An-Ski Expeditions of 1912–1914. The repertoire is mixed, combining typical Jewish dance and non-dance genres, European society and folklore dance music, and a relatively small quantity of songs and liturgical chant settings. The project simultaneously encodes music in formats accessible to computational musicology and enhances a creative musical community and deeply valued heritage. We introduce the project in dialogue with a recent article by Georgina Born on diversity in the field of MIR; present the material, issues for datafication, and results thus far; describe project elements that enhance musical community; demonstrate the diversity of participants with respect to age, gender, nationality, and profession; outline implications for MIR and computational ethnomusicology; and suggest new funding models and partnerships in support of cultural heritage documentation, preservation, continuity, and analysis.

Mandal, Sukumar. 2022. “Integration of Linked Open Data Authorities with OpenRefine : A Methodology for Libraries.” Library Philosophy and Practice (e-Journal), no. 7195 (May):11.

The primary purpose of this paper is to explore the integration process of linked open data authority with OpenRefine for easy access to related metadata towards the creation of data cleaning and updating in a modern integrated library system. The integration process and methods are based on the API of reconciliation repositories collected from web resources. This integrated framework will be designed and developed on OpenRefine techniques and components based on RDF, CSV, SPARQL, and Turtle scripts. This integrated framework is based on JAVA and Apache Web Server for running the OpenRefine on the Ubuntu Platform. This integrated framework has explored the data cleaning and import of bibliographic metadata from multiple linking authorities such as Open Library, ORCID, VIAF, VIAF BNF, Library of Congress Authorities data, and Wikidata. It is possible to fetch related linking authorities for enhancing the advanced level services in a modern library management system. So, library carpentry and data carpentry are essential concepts for making a dynamic integrated interface for library professionals.

Marrero, Monica, and Antoine Issac. 2022. “Implementation and Evaluation of a Multilingual Search Pilot in the Europeana Digital Library.” In Linking Theory and Practice of Digital Libraries:, 13541:93–105. Lecture Notes in Computer Science. Padua, Italy: Springer International Publishing.

Europeana, a digital library that aggregates content from libraries, archives and museums from all around Europe, offers search functionality using the metadata of more than 62 million objects. However, in most cases, this data is only available in one language, while users come from countries with different languages. Europeana’s strategy for the improvement of multilingual experiences includes the design and implementation of a multilingual information retrieval system based on the translation of queries and metadata to English. As a first development in this context, we have implemented a pilot applying query translation to English for the Spanish version of the website in order to surface results that have English metadata associated with them. We conducted an evaluation to assess the performance of this pilot and identify issues. The good performance rates observed allowed us to take the pilot to production, and the issues identified led to a list of specific actions, which should be addressed to the extent possible before the application of a wider multilingual information retrieval system.

McKenna, Lucy, Christophe Debruyne, and Declan O’Sullivan. 2022. “Using Linked Data to Create Provenance-Rich Metadata Interlinks: The Design and Evaluation of the NAISC-L Interlinking Framework for Libraries, Archives and Museums.” AI & SOCIETY, January, 27.

Linked data (LD) have the capability to open up and share materials, held in libraries, archives, and museums (LAMs), in ways that are restricted by many existing metadata standards. Specifically, LD interlinking can be used to enrich data and to improve data discoverability on the Web through interlinking related resources across datasets and institutions. However, there is currently a notable lack of interlinking across leading LD projects in LAMs, impacting the discoverability of their materials. In this article, LAM Linked Data projects and services were reviewed, including the Library of Congress, The German National Library, and the French National Library. Six Linked Data interlinking tools were also reviewed (AgreementMaker, LogMap, LinkItUp, The SILK Link Discovery Framework, The LIMES Link Discovery Framework for Metric Spaces, and the OpenRefine RDF Extension). The research also describes the Novel Authoritative Interlinking for Semantic Web Cataloguing in Libraries (NAISC-L) interlinking framework. Unlike existing interlinking frameworks, NAISC-L was designed specifically with the requirements of the LAM domain in mind. NAISC-L supports the linking of related resources across datasets and institutions, thereby enabling richer and more varied search queries, and can thus be used to improve the discoverability of materials held in LAMs.

Meedin, Nadeera, Maneesha Caldera, Suresha Perera, and Indika Perera. 2022. “A Novel Annotation Scheme to Generate Hate Speech Corpus through Crowdsourcing and Active Learning.” International Journal of Advanced Computer Science and Applications 13 (11):9.

The number of user-generated posts is growing exponentially with social media usage growth. Promoting violence against or having the primary purpose of inciting hatred against individuals or groups based on specific attributes via social media posts is daunting. As the posts are published in multiple languages with different forms of multimedia, social media finds it challenging to moderate before reaching the audience and assessing the posts as hate speech becomes sophisticated due to subjectivity. Social media platforms lack contextual and linguistic expertise and social and cultural insights to identify hate speech accurately. Research is being carried out to detect hate speech on social media content in English using machine learning algorithms, etc., using different crowdsourcing platforms. However, these platforms’ workers are unavailable from countries such as Sri Lanka. The lack of a workforce with the necessary skill set and annotation schemes symbolizes further research essentiality in low-resource language annotation. This research proposes a suitable crowdsourcing approach to label and annotates social media content to generate corpora with words and phrases to identify hate speech using machine learning algorithms in Sri Lanka. This paper summarizes the annotated Facebook posts, comments, and replies to comments from public Sri Lankan Facebook user profiles, pages and groups of 52,646 instances, unlabeled tweets based on 996 Twitter search keywords of 45,000 instances of YouTube Videos of 45,000 instances using the proposed annotation scheme. 9%, 21% and 14% of Facebook, Twitter and YouTube posts were identified as containing hate content. In addition, the posts were categorized as offensive and nonoffensive, and hate targets and corpus associated with hate targets focusing on an individual or group were identified and presented in this paper. The proposed annotation scheme could be extended to other low-resource languages to identify the hate speech corpora. With the use of a well-implemented crowdsourcing platform with the proposed novel annotation scheme, it will be possible to find more subtle patterns with human judgment and filtering and take preventive measures to create a better cyberspace.

Mountantonakis, Michalis, and Yannis Tzitzikas. 2022. “How Your Cultural Dataset Is Connected to the Rest Linked Open Data?” In TMM_CH 2021: Trandisciplinary Multispectral Modelling and Cooperation for the Preservation of Cultural HeritageTrandisciplinary Multispectral Modelling and Cooperation for the Preservation of Cultural Heritage, 136–48. Communications in Computer and Information Science. Athens, Greece: Springer International Publishing.

More and more publishers tend to create and upload their data as digital open data, and this is also the case for the Cultural Heritage (CH) domain. For facilitating their Data Interchange, Integration, Preservation and Management, publishers tend to create their data as Linked Open Data (LOD) and connect them with existing LOD datasets that belong to the popular LOD Cloud, which contains over 1,300 datasets (including more than 150 datasets of CH domain). Due to the high amount of available LOD datasets, it is not trivial to find all the datasets having commonalities (e.g., common entities) with a given dataset at real time. However, it can be of primary importance for several tasks to connect these datasets, for being able to answer more queries and in a more complete manner (e.g., for better understanding our history), for enriching the information of a given entity (e.g., for a book, a historical person, an event), for estimating the veracity of data, etc. For this reason, we present a research prototype, called ConnectionChecker, which receives as input a LOD Dataset, computes and shows the connections to hundreds of LOD Cloud datasets through LODsyndesis knowledge graph, and offers several measurements, visualizations and metadata for the given dataset. We describe how one can exploit ConnectionChecker for their own dataset, and we provide use cases for the CH domain, by using two real linked CH datasets: a) a dataset from the National Library of Netherlands, and b) a dataset for World War I from the Universities of Aalto and Helsinki.

Nunes, Sérgio, Tiago Silva, Cláudia Martins, and Rita Peixoto. 2022. “EPISA Platform: A Technical Infrastructure to Support Linked Data in Archival Management.” In Proceedings of the Workshops and Doctoral Consortium of the 26th International Conference on Theory and Practice of Digital Libraries, 11. Padua, Italy: CEUR Workshop Proceedings.

In this paper we describe the EPISA Platform, a technical infrastructure designed and developed to support archival records management and access using linked data technologies. The EPISA Platform follows a client-server paradigm, with a central component, the EPISA Server, responsible for storage, reasoning, authorization, and search; and a frontend component, the EPISA ArchClient, responsible for user interaction. The EPISA Server uses Apache Jena Fuseki for storage and reasoning, and Apache Solr for search. The EPISA ArchClient is a web application implemented using PHP Laravel and standard web technologies. The platform follows a modular architecture, based on Docker containers. We describe the technical details of the platform and the main user interaction workflows, highlighting the abstractions developed to integrate linked data in the archival management process. The EPISA Platform has been successfully used to support research and development of linked data use in the archival domain in the context of the EPISA project.

Pal, Anjan, and Parthasarath Mukhopadhyay. 2022. “Fetching Automatic Authority Data in ILS from Wikidata via OpenRefine.” SRELS Journal of Information Management 59 (6):353–62.

Authority data is vital for effective library and information services. It serves a major purpose in realizing the collocation function of library catalogues and indexes. Unfortunately, however, authority control has been neglected in library catalogues and other bibliographic databases in India. This paper seeks to demonstrate how authority data can be fetched automatically from Wikidata, a sibling project of Wikipedia. For this purpose, the query language SPARQL is required to formulate the names of persons of Indian origin along with their date of birth and place in Wikidata. The collected datasets are processed and implemented as MARC21-based authority data in KOHA, an open-source library management software. The ways in which the library and information science community can use these free, open-source platforms to gather, organize and share data and how they enhance the retrieval efficiency are shown.

Perera, Treshani. 2022. “Project Management Strategies for Managing Metadata in Institutional Recordings Collections – A Case Study.” Music Reference Services Quarterly, July, 1–22.

This paper will cover project management decisions, workflows, and practical strategies adopted by a music-cataloging librarian while managing an academic institutional recordings collection. The paper is not intended to serve as a go-to resource for managing metadata in institutional recordings collections; rather, a practical approach to managing time, resources, and personnel while meeting institutional priorities as the project manager tasked with organization, metadata management, pro­ cessing, and preservation of the physical collection. The paper will cover project management strategies for creating a collection inventory, which was later expanded to a full-level metadata collection during COVID-19 remote work.

Petrovski, Aleksandar. 2022. “A Bilingual English-Ukrainian Lexicon of Named Entities Extracted from Wikipedia.” In Conference on Language Technologies & Digital Humanities 2022, 7. Ljubljana, Slovenia.

This paper describes the creation of a bilingual English - Ukrainian lexicon of named entities, with Wikipedia as a source. The proposed methodology provides a cheap opportunity to build multilingual lexicons, without having expertise in target languages. The extracted named entity pairs have been classified into five classes: PERSON, ORGANIZATION, LOCATION, PRODUCT, and MISC (miscellaneous). It has been achieved using Wikipedia metadata. Using the presented methodology, a huge lexicon has been created, consisting of 624,168 pairs. The classification quality has been checked manually on 1,000 randomly selected named entities. The results obtained are 97% for precision and 90% for recall.

Proutskova, Polina, Daniel Wolff, György Fazekas, Klaus Frieler, Frank Höger, Olga Velichkina, Gabriel Solis, et al. 2022. “The Jazz Ontology: A Semantic Model and Large-Scale RDF Repositories for Jazz.” Journal of Web Semantics 74 (October):100735.

The Jazz Ontology is a semantic model that addresses the challenges the domain of jazz poses due to musical content and performance specificities. The model builds strongly on the Music Ontology and utilizes datasets such as MusicBrainz, the Weimar Jazz Database, and LinkedJazz to build out the Ontology further. Some elements were modified, such as creating a shortcut between the Music Ontology Performance and Signal classes, and bypassing the abstract Sound concept and Recording event. For bands, the model utilizes a relationship to connect the band to its leader and relates Performers to a single Performance to allow for musicians to change on tracks. The ontology has been assessed by examining how well it supports describing and merging existing datasets and whether it facilitates novel discoveries in a music browsing application. The utility of the ontology is also demonstrated in a novel framework for managing jazz-related music information. This involves the population of the Jazz Ontology with the metadata from large-scale audio and bibliographic corpora (the Jazz Encyclopedia and the Jazz Discography). The resulting RDF datasets were merged and linked to existing Linked Open Data resources. These datasets are publicly available and are driving an online application used by jazz researchers and music lovers for the systematic study of jazz.

Putnam, Nathan. 2022. “VIAF and the Linked Data Ecosystem.” Jlist.It 13 (1). EUM-Edizioni Università di Macerata:196–202.

This article reviews the founding, current state, and potential future of VIAF®, the Virtual International Authority File. VIAF consists of an aggregation of bibliographic and authority data from over 50 national agencies and infrastructures, systems that follow different cataloging practices, and contain hundreds of languages. After a short history of the project, the results of surveys for implementers of linked data projects on the use of VIAF data provide suggestions for future use and sustainability.

Santschi, Stephanie. 2022. “Mapping Late Hokusai Research: Digitizing and Publishing Bilingual Research Data.” Digital Studies 12 (1). Open Library of Humanities:23.

The initiative “Late Hokusai: Thought, Technique, Society” took place at the British Museum (BM) and SOAS, University of London (2016–2019). As part of its activities, it built a linked-data platform prototype on ResearchSpace. The prototype offers a redesigned process for how museum researchers and users find, research with, discuss and expand bilingual data about early modern Japanese artist Katsushika Hokusai (1760–1849) and instigated a discussion about what a collaborative research platform for the Hokusai research community could look like. While Japanese resource specialists have long recognized the complexity of Japanese script as a challenge for multilingual research and collection platforms, the processes for and results of integrating Japanese source data into bi- or multilingual museum databases remained unsatisfactory.This paper revisits the challenges posed by “non-Latin script” (NLS) in museum databases in the case of the Hokusai research platform at the British Museum, which integrated Japanese and English languages. It localizes the issues arising from working with Japanese source data in the Latin script project environment and accompanies the museum researchers’ tasks regarding the correct input, rendering and display of the source script at each step: 1) object analysis, 2) registering NLS metadata, 3) processing NLS information and 4) visualizing LS and NLS information for general and specialist audiences. After assessing these practices, the paper critically reflects on selected approaches, successes, and shortcomings experienced while creating such a prototype. By sharing its experiences, the project hopes to aid prospective research projects on a similar path regarding project setup and documentation. Furthermore, it advocates the sustainability of research practices according to data reusability

Storti, Emanuele. 2022. “Towards a Knowledge Graph Representation of FAIR Music Content for Exploration and Analysis.” In Proceedings of the Workshops and Doctoral Consortium of the 26th International Conference on Theory and Practice of Digital Libraries, 12. Padua, Italy: CEUR Workshop Proceedings.

This paper introduces the ontological model for a FAIR digital library of music documents which takes into account a variety of music-related information, among which editorial information on documents and their production workflow as well as the score content and licensing information. The model is complemented with annotations (e.g. comments, fingering) on music documents produced by end-users, capable to add a social layer over the framework which enables the building of user-centric music applications. As a result, a machine-understandable knowledge graph of music content is defined, which can be queried, navigated and explored. On top of this, novel applications could be designed, like semantic workplaces where music scholars and musicians can find, analyse, compare, annotate and manipulate musical objects.

Szeto, Kimmy. 2022. “Ontology for Voice, Instruments, and Ensembles (OnVIE): Revisiting the Medium of Performance Concept for Enhanced Discoverability.” The Code4Lib Journal, no. 54 (August).

Medium of performance—instruments, voices, and devices—is a frequent starting point in library users’ search for music resources. However, content and encoding standards for library cataloging have not been developed in a way that enables clear and consistent recording of medium of performance information. Consequently, unless specially configured, library discovery systems do not display medium of performance or provide this access point. Despite efforts to address this issue in the past decade in RDA, MARC, and the linked data environment, medium of performance information continues to be imprecise, dispersed across multiple fields or properties, and implied in other data elements. This article proposes revised definitions for “part,” “medium,” “performer,” and “ensemble,” along with a linked data model, the Ontology for Voice, Instruments, and Ensembles (OnVIE), that captures precise and complete medium of performance data reflecting music compositional practices, performance practices, and publishing conventions. The result is an independent medium of performance framework for recording searchable and machine-actionable metadata that can be hooked on to established library metadata ontologies and is widely applicable to printed and recorded classical, popular, jazz, and folk music. The clarity, simplicity, and extensibility of this model enable machine parsing so that the data can be searched, filtered, sorted, and displayed in multiple, creative ways.

Tan, Mary Ann, Etienne Posthumus, and Harald Sack. 2022. “Audio Ontologies for Intangible Cultural Heritage.” In Proceedings of The Semantic Web: ESWC 2022 Satellite Events, 13384:171–75. Hersonlssos, Crete, Greece: Springer International Publishing.

Cultural heritage portals often contain intangible objects digitized as audio files. This paper presents and discusses the adaptation of existing audio ontologies intended for non-cultural heritage applications. The resulting alignment of the German Digital Library-Europeana Data Model (DDB-EDM) with Music Ontology (MO) and Audio Commons Ontology (ACO) is presented.

Topham, Kate, Julian Chambliss, Justin Wigard, and Nicole Huff. 2022. “The Marmaduke Problem: A Case Study of Comics as Linked Open (Meta)Data.” KULA: Knowledge Creation, Dissemination, and Preservation Studies, Metadata as Knowledge, 6 (3):1–8.

Michigan State University (MSU) is home to one of the largest library comics collections in North America, holding over three hundred thousand print comic book titles and artifacts. Inspired by the interdisciplinary opportunity offered by digital humanities practice, a research collaborative linked to the MSU Library Digital Scholarship Lab (DSL) developed a Collections as Data project focused on the Comic Art Collection. This team extracted and cleaned over forty-five thousand MARC records describing comics published in Canada, Mexico, and the United States. In order to bridge digital humanities with the popular culture legacy of the institution, the MSU comics community turned to bibliographic metadata as a new way to leverage the collection for scholarly analysis. In October 2020, the Department of English Graphic Possibilities Research Workshop gathered a group of scholars, librarians, Wikidatians, and enthusiasts for a virtual Wikidata edit-a-thon. This project report will present this event as a case study to discuss how linked open metadata may be used to create knowledge and how community knowledge can, in turn, enrich metadata. They explore not only how the participants utilized the open-access tool Mix’n’match to connect the Comic Art Collection dataset to Wikidata and increase awareness of lesser-known authors and regional publishers missing from OCLC and Library of Congress databases, but how the knowledge of this community in turn revealed issues of authority control.

Zhang, Bohui, Filip Ilievski, and Pedro Szekely. 2022. “Enriching Wikidata with Linked Open Data.” ArXiv:2207.00143, Computer Science, , August, 17.

Large public knowledge graphs, like Wikidata, contain billions of statements about tens of millions of entities, thus inspiring various use cases to exploit such knowledge graphs. However, practice shows that much of the relevant information that fits users’ needs is still missing in Wikidata, while current linked open data (LOD) tools are not suitable to enrich large graphs like Wikidata. In this paper, the writers investigate the potential of enriching Wikidata with structured data sources from the LOD cloud. They present a novel workflow that includes gap detection, source selection, schema alignment, and semantic validation. They evaluate our enrichment method with two complementary LOD sources: a noisy source with broad coverage, DBpedia, and a manually curated source with a narrow focus on the art domain, Getty. Their experiments show that our workflow can enrich Wikidata with millions of novel statements from external LOD sources with high quality. Property alignment and data quality are key challenges, whereas entity alignment and source selection are well-supported by existing Wikidata mechanisms. They make our code and data available to support future work.


Aljalahmah, Saleh H., and Oksana L. Zavalina. 2021. “Information Representation and Knowledge Organization in Cultural Heritage Institutions in Arabian Gulf: A Comparative Case Study.” Journal of Information & Knowledge Management 20 (4):24.

This paper presents the exploratory study conducted with the goal of developing an understanding of the current state of information representation and knowledge organization in cultural heritage collections in Arabian Gulf countries and perspectives for future developments. This comparative case study focused on three institutions (an archive, an academic library, and a museum), including early adopters and leaders in digital archiving in the region. The mixed-methods research combined semi-structured interviews with in-depth comparative content analysis of metadata records that represent items in institutions’ collections. Despite the limitations of the small-sample analysis, this exploratory case study makes a substantial contribution to research and practice. It is the first study to evaluate information representation and knowledge organization practices in cultural heritage collections of Arabian Gulf countries. This study also can inform the planning and implementation of the large-scale study of the state and perspectives of information representation and knowledge organization across digital and physical collections of libraries, museums, and archives in the region. Suggestions for future research are included. Practical implications of the study include empirical support for the need for metadata training, development and documenting metadata creation guidelines and crosswalks, and collection and use of feedback from users and knowledge management professionals to improve information representation and knowledge organization. Results also provide insights into the interoperability potential of metadata for future regional, national, and international aggregations of cultural heritage digital collections across the Arabian Gulf region.

Bianchini, Carlo, Stefano Bargioni, and Camillo Carlo Pellizzari di San Girolamo. 2021. “Beyond VIAF:” Information Technology and Libraries 40 (2):31.

This paper aims to investigate the reciprocal relationship between VIAF® and Wikidata and their possible roles in the semantic web environment. It deals with their data, their approach, their domain, and their stakeholders, with particular attention to identification as a fundamental goal of Universal Bibliographic Control. After examining interrelationships among VIAF, Wikidata, libraries, and other GLAM institutions, a double approach is used to compare VIAF and Wikidata: first, a quantitative analysis of VIAF and Wikidata data on personal entities, presented in eight tables; and second, a qualitative comparison of several general characteristics, such as purpose, scope, organizational and theoretical approach, data harvesting and management (shown in table 9). Quantitative data and qualitative comparison show that VIAF and Wikidata are quite different in their purpose, scope, organizational and theoretical approach, data harvesting, and management. The study highlights the reciprocal role of VIAF and Wikidata and their helpfulness in the worldwide bibliographical context and in the semantic web environment and outlines new perspectives for research and cooperation.

Boczar, Jason, Bonita Pollock, Xiying Mi, and Amanda Yeslibas. 2021. “Bridging the Gap.” Information Technology and Libraries 40 (4). Chicago, United States: American Library Association:1–15.

Due to COVID-19, many GLAM institutions saw an increase in materials going online. The University of South Florida Libraries utilized Linked Data technology to provide easy access to digital cultural heritage collections not only for the scholarly communities but also for underrepresented user groups. The paper covers the challenges of putting information online, discusses Linked Data and the solutions it can provide, and propose future work to further the effort.

Bonora, Paolo, and Angelo Pompilio. 2021. “Corago in LOD : The debut of an Opera repository into the Linked Data arena.” EUM-Edizioni Università di Macerata, 54–72.

The paper examines the adoption of the Semantic Web (SW) technologies and Linked Data (LD) principles to manage a knowledge base about opera. The Corago repository collects historical data and documentation about opera works, performances, and librettos from the 16th to the 20th century. The writers experimented with the use of semantic technologies to manage the repository’s knowledge cataloged following the Functional Requirements for Bibliographic Records (FRBR) relational model. Cultural Heritage Knowledge Bases (CHKB) as Corago could leverage SW and LD to overcome proprietary models and to introduce new information to better satisfy users’ requirements.

Dagher, Iman, and Denise Soufi. 2021. “Authority Control of Arabic Personal Names: RDA and Beyond.” Cataloging & Classification Quarterly 59 (2–3):260–80.

This paper discusses the basics of creating name authority records for Arabic personal names in accordance with Resource Description and Access instructions and Program for Cooperative Cataloging guidelines. A background into the use of romanization for non-Latin scripts in bibliographic and authority records is provided to establish the context. Issues with romanization that are particular to Arabic are addressed. Separate sections on modern and classical names provide an overview of the major challenges, and strategies to enhance discovery are outlined. The paper concludes with an examination of the possible benefits of identity management and other changes in the authority control landscape for names in non-Latin script.

Fafalios, Pavlos, Kostas Petrakis, Georgios Samaritakis, Korina Doerr, Athina Kritsotaki, Yannis Tzitzikas, and Martin Doerr. 2021. “FAST CAT: Collaborative Data Entry and Curation for Semantic Interoperability in Digital Humanities.” Journal on Computing and Cultural Heritage 14 (4):1–20.

FAST CAT is a web-based collaborative software system for assistive data entry and curation in Digital Humanities and other forms of empirical research. The program was approached with semantic interoperability in mind, which allows for data to be exchanged with unambiguous, shared meaning. The paper details the functionality and user interface offered by FAST CAT. The paper showcases a use case via the SeaLiT project, which examines the economic, social, and demographic impacts of the introduction of steamboats in the Mediterranean area between the 1850s and the 1920s.

Faraj, Ghazal, and András Micsik. 2021. “Representing and Validating Cultural Heritage Knowledge Graphs in CIDOC-CRM Ontology.” Future Internet 13 (11). Multidisciplinary Digital Publishing Institute:277.

In order to unify access to multiple heterogeneous sources of cultural heritage data, many datasets were mapped to the CIDOC-CRM ontology. CIDOC-CRM provides a formal structure and definitions for most cultural heritage concepts and their relationships. The COURAGE project includes historic data concerning people, organizations, cultural heritage collections, and collection items covering the period between 1950 and 1990. Therefore, CIDOC-CRM seemed the optimal choice for describing COURAGE entities, improving knowledge sharing, and facilitating the COURAGE dataset unification with other datasets. This paper introduces the results of translating the COURAGE dataset to CIDOC-CRM semantically. This mapping was implemented automatically according to predefined mapping rules. Several SPARQL queries were applied to validate the migration process manually. In addition, multiple SHACL shapes were conducted to validate the data and mapping models.

Filtz, Erwin, Sabrina Kirrane, and Axel Polleres. 2021. “The Linked Legal Data Landscape: Linking Legal Data across Different Countries.” Artificial Intelligence and Law 29 (4):485–539.

The European Union is working toward harmonizing legislation across Europe, in order to improve the cross-border interchange of legal information. This goal is supported for instance via standards such as the European Law Identifier (ELI) and the European Case Law Identifier (ECLI), which provide technical specifications for Web identifiers and suggestions for vocabularies to be used to describe metadata pertaining to legal documents in a machine-readable format. Notably, these ECLI and ELI metadata standards adhere to the RDF data format which forms the basis of Linked Data and therefore has the potential to form a basis for a pan-European legal Knowledge Graph. Unfortunately, to date said specifications have only been partially adopted by EU member states. In this paper, we describe a methodology to transform the existing legal information system used in Austria into such a legal knowledge graph covering different steps from modeling national specific aspects to population, and finally the integration of legal data from other countries through linked data. We demonstrate the usefulness of this approach by exemplifying practical use cases from legal information searches, which are not possible in an automated fashion so far.

Frosterus, Matias, David Hansson, Maral Dadvar, Ilias Kyriazis, and Sofia Zapounidou. 2021. “6 Steps for Publishing Library Linked Open Data.” LIBER Linked Open Data Working Group.

Presented by the LIBER Linked Open Data (LOD) Working Group, this report looks at publishing LOD from a library perspective and argues why it should be employed and how. They present various aspects of the topic, introduce different options available, and lay out a foundation for possible exploration at a later stage. As such, the body of this document presents the six steps of LOD publication and explains each one of these steps in depth.

Gal, Avigdor, Haggai Roitman, and Roee Shraga. 2021. “Learning to Rerank Schema Matches.” IEEE Transactions on Knowledge and Data Engineering 33 (8):3104–16.

Schema matching is at the heart of integrating structured and semi-structured data with applications in data warehousing, data analysis recommendations, Web table matching, etc. Schema matching is known as an uncertain process and a standard method to overcome this uncertainty introduces a human expert with a ranked list of possible schema matches to choose from, known as top-K matching. In this work, the writers propose a learning algorithm that utilizes an innovative set of features to rerank a list of schema matches and improve upon ranking the best match. They provide a bound on the size of an initial match list, tying the number of matches with the desired level of confidence in finding the best match. They also propose the use of matching predictors as features in a learning task and tailored nine new matching predictors for this purpose. The proposed algorithm assists the matching process by introducing a quality set of alternative matches to a human expert. It also serves as a step towards eliminating the involvement of human experts as decision makers in a matching process altogether. A large-scale empirical evaluation with a real-world benchmark shows the effectiveness of the proposed algorithmic solution.

Green, Ashlea M. 2021. “Metadata Application Profiles in U. S. Academic Libraries: A Document Analysis.” Journal of Library Metadata 21 (3–4). Routledge:105–43.

This paper describes a document analysis of 24 metadata application profiles (MAPs) used by academic libraries in the United States. The MAPs under study were collected from (a) the DLF AIG Metadata Application Profile Clearinghouse and (b) a Google search of .edu domains. Data collection and analysis took place between December 2020 and February 2021. While most of the MAPs under review provided metadata guidelines for digital collections, a small number were intended for institutional repositories or research data management. The study’s findings reveal MAP features and content, usage of controlled vocabularies and standards, and other characteristics pertaining to MAP document scope, contents, and format in this context. In addition to its discussion of the literature, the paper’s findings should help metadata specialists and others involved in digital collection management gain insights useful in the development or revision of their own metadata documentation. Further, these findings offer a current glimpse of metadata application practices among U.S. academic libraries generally.

Kalogeros, Eleftherios, Matthew Damigos, Michalis Sfakakis, Sofia Zapounidou, Aggeliki Drakopoulou, Costas Zervopoulos, Gerasimos Martinis, Christos Papatheodorou, and Manolis Gergatsoulis. 2021. “Digitizing, Transcribing and Publishing the Handwritten Music Score Archives of Ionian Islands Philharmonic Bands.” In Metadata and Semantic Research, 1537:388–99. Communications in Computer and Information Science. Cham: Springer International Publishing.

During the long history of the philharmonic bands in the Ionian Islands, valuable archives of handwritten music scores have been established. These archives consist of the scores of original works locally created and adaptations of western music works of Greek and other European composers. For the long-term preservation of the archives of 7 Philharmonic Bands, the handwritten music scores were digitized and a significant amount of them was transcribed into MusicXML. Moreover, all these archives were integrated into and published as a single archive. All these activities were part of the project “Preservation and Prominence of the Musical Heritage of the Region of Ionian Islands Prefecture through the management of the digital archives of the Philharmonic Orchestras of the Region.” This work presents the challenges, the workflows, and the system developed to achieve the objectives of the project.

Kern, Christopher Julian, Thomas Schäffer, and Dirk Stelzer. 2021. “Towards Augmenting Metadata Management by Machine Learning.” INFORMATIK 2021, 10.

Managing metadata is an important section of master data management. It is a complex, comprehensive, and labor-intensive task. This paper explores whether and how metadata management can be augmented by machine learning. The writers deduce requirements for managing metadata from the literature and from expert interviews. They also identify features of machine learning algorithms. They assess 15 machine learning algorithms to determine their contribution to meeting the requirements and the extent to which they can support metadata management. Supervised and unsupervised learning algorithms, as well as neural networks, have the greatest potential to support metadata management effectively. Reinforcement learning, however, does not seem to be well suited to augment metadata management. Using Support Vector Machines and the identification of metadata as an example, we show how machine learning algorithms can support metadata management.

Khan, Fahad, and Ana Salgado. 2021. “Modelling Lexicographic Resources Using CIDOC-CRM, FRBRoo and Ontolex-Lemon.” In Proceedings of the International Joint Workshop on Semantic Web and Ontology Design for Cultural Heritage Co-Located with the Bolzano Summer of Knowledge 2021, 12. Bolzano, Italy.

The article describes a new approach to the modeling and publication of lexicographic resources, including retro-digitized dictionaries, as linked data. This approach is based on the use of the CIDOC-CRM aligned FRBRoo ontology together with the Ontolex-Lemon vocabulary and its follow-up lexicographic module, lexicog. After introducing the TEI-based distinction between different views on lexicographic resources, the writers discuss Ontolex-Lemon and CIDOC-CRM, and FRBRoo. Next, they look at some motivating use cases before introducing our approach. Finally, they model one of these use cases in more depth using this approach.

Megdiche, Imen, Franck Ravat, and Yan Zhao. 2021. “Metadata Management on Data Processing in Data Lakes.” In SOFSEM 2021: Theory and Practice of Computer Science, 12607:559–68. Lecture Notes in Computer Science. Bolzano-Bozen, Italy: Springer International Publishing.

Data Lake (DL) is known as a Big Data analysis solution. A data lake stores not only data but also the processes that were carried out on these data. It is commonly agreed that data preparation/transformation takes most of the data analyst’s time. To improve the efficiency of data processing in a DL, the writers propose a framework that includes a metadata model and algebraic transformation operations. The metadata model ensures the findability, accessibility, interoperability, and reusability of data processes as well as data lineage of processes. Moreover, each process is described through a set of coarse-grained data trans forming operations which can be applied to different types of datasets. They illustrate and validate our proposal with a real medical use case implementation.


Falk, Patricia, and David R. Lewis. 2020. “A New Take on Cataloging Popular Music Recordings.” Cataloging & Classification Quarterly 58 (8):683–704.

Cataloging popular music audio formats such as compact discs (CDs) and LPs has always required different procedures from cataloging Western art music recordings. Bibliographic records and standards have changed during the past twenty years and catalogers have switched from using Anglo-American Cataloguing Rules (AACR2) to Resource Description and Access (RDA) for cataloging materials. This article will illustrate the changes made in popular music cataloging since the 2001 publication of Terry Simpkins’ article “Cataloging Popular Music Recordings.”Additional issues such as name authority and subject authority creation have been included, as well as new codes and Machine-readable record (MARC) tags being used in bibliographic records.

Koch, Ines, Cristina Ribeiro, and Carla Teixeira Lopes. 2020. “ArchOnto, a CIDOC-CRM-Based Linked Data Model for the Portuguese Archives.” In Proceedings of the Digital Libraries for Open Knowledge: 24th International Conference on Theory and Practice of Digital Libraries, 12246:149–62. Lecture Notes in Computer Science. Lyon, France: Springer International Publishing.

Archives are faced with great challenges due to the vast amounts of data they have to curate. New data models are required, and work is underway. The International Council on Archives is creating the RiC-CM (Records in Context), and there is a long line of work in museums with the CIDOC-CRM (CIDOC Conceptual Reference Model). Both models are based on ontologies to represent cultural heritage data and link them to other information. The Portuguese National Archives holds a collection with over 3.5 million metadata records, described with the ISAD(G) standard. The archives are designing a new linked data model and a technological platform with applications for archive contributors, archivists, and the public. The current work extends CIDOC CRM into ArchOnto, an ontology-based model for archives. The model defines the relevant archival entities and properties and will be used to migrate existing records. ArchOnto accommodates the existing ISAD(G) information and takes into account its implementation with current technologies. The model is evaluated with records from representative fonds. After the test on these samples, the model is ready to be populated with the semi-automatic transformation of the ISAD records. The evaluation of the model and the population strategies will proceed with experiments involving professional and lay users.

Patrício, Helena Simões, Maria Inês Cordeiro, and Pedro Nogueira Ramos. 2020. “From the Web of Bibliographic Data to the Web of Bibliographic Meaning: Structuring, Interlinking and Validating Ontologies on the Semantic Web.” International Journal of Metadata, Semantics and Ontologies 14 (2). Inderscience Publishers (IEL).

Bibliographic data sets have revealed good levels of technical interoperability observing the principles and good practices of linked data. However, they have a low level of quality from the semantic point of view, due to many factors: lack of a common conceptual framework for a diversity of standards often used together, reduced number of links between the ontologies underlying data sets, the proliferation of heterogeneous vocabularies, underuse of semantic mechanisms in data structures, “ontology hijacking” (Feeney et al., 2018), point-to-point mappings, as well as limitations of semantic web languages for the requirements of bibliographic data interoperability. After reviewing such issues, a research direction is proposed to overcome the misalignments found by means of a reference model and a superontology, using Shapes Constraint Language (SHACL) to solve the current limitations of RDF languages.

Pegoraro Santana, Igor André, Fabio Pinhelli, Juliano Donini, Leonardo Catharin, Rafael Biazus Mangolin, Yandre Maldonado e Gomes da Costa, Valéria Delisandra Feltrim, and Marcos Aurélio Domingues. 2020. “Music4All: A New Music Database and Its Applications.” In 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), 399–404. Rio de Janeiro, Brazil.

One of the goals of the music information retrieval (MIR) community is to research new methods and create new systems that can efficiently and effectively retrieve and recommend songs from large databases of music content. Despite the volume of research in the area, there is a lack of music databases to support these works, i.e. databases that comply with some quite desirable requirements for the development of research, such as a huge amount of music pieces, the audio signal availability and a great diversity of audio attributes. In order to contribute to the MIR community, the authors present Music4All, a new music database that contains metadata, tags, genre information, 30-seconds audio clips, lyrics, and so on. Additionally, they also exemplify some MIR tasks that may benefit from our database and compare it with other databases proposed in the literature.

Pramudyo, Gani Nur, and Muhammad Rosyihan Hendrawan. 2020. “Metadata Interoperability for Institutional Repositories: A Caste Study in Malang City Academic Libraries.” In Digital Libraries at Times of Massive Societal Transition, 12504:359–67. Lecture Notes in Computer Science. Kyoto, Japan: Springer International Publishing.

The aim of this study is to understand, describe, and analyze metadata interoperability in Universitas Brawijaya Library which used Brawijaya Knowledge Garden (BKG) and Eprints software, University of Muhammadiyah Malang Library which used Ganesha Digital Library (GDL), and Eprints software, and Malang State Library that used Muatan Lokal (Mulok) software. This study also discussed supporting and inhibiting factors for interoperability metadata. This study employed a case study-qualitative approach. The finding indicates that metadata interoperability can be performed by using metadata crosswalks.

Proutskova, Polina, Anja Volk, Peyman Heidarian, and György Fazekas. 2020. “FROM MUSIC ONTOLOGY TOWARDS ETHNO-MUSIC-ONTOLOGY.” In Proceedings of 21st International Society for Music Information Retrieval Conference, 923–31. Montréal QC Canada: ISMIR Press.

This paper presents exploratory work investigating the suitability of the Music Ontology [33] - the most widely used formal specification of the music domain - for modeling non-Western musical traditions. Four contrasting case studies from a variety of musical cultures are analyzed: Dutch folk song research, the reconstructive performance of rural Russian traditions, contemporary performance and composition of Persian classical music, and recreational use of a personal world music collection. The authors propose semantic models describing the respective domains and examine the applications of the Music Ontology for these case studies: which concepts can be successfully reused, where they need adjustments, and which parts of the reality in these case studies are not covered by the Music Ontology. The variety of traditions, contexts, and modeling goals covered by the case studies sheds light on the generality of the Music Ontology and on the limits of generalization “for all musics” that could be aspired for on the Semantic Web.

Pugin, Laurent, and Claudio Bacciagaluppi. 2020. “An Analysis of Musical Work Datasets and Their Current Level of Linkage.” In 7th International Conference on Digital Libraries for Musicology, 32–39. Montréal QC Canada: ACM.

Music works are key concepts that present a powerful linkage potential fully acknowledged in the fields of digital music libraries and digital musicology. They form an abstract connecting point for the entities referring to them, and large work datasets act as authority data that offer a promising analysis and search potential. These days, digital music libraries and digital musicology research rely primarily on datasets that have been created over the last decade, mostly from previously existing datasets, such as bibliographic records. In this paper, the authors try to provide a better understanding of the content of some of the most important datasets available and evaluate their level of linking. They analyze two leading library datasets, namely those of the Bibliothèque nationale de France (BnF) and the Deutsche Nationalbibliothek (DNB), both available in RDF format, and look at how many works they contain, how these are distributed over time, and their distribution by the composer. They compare the results with two other datasets that have completely different backgrounds, namely the Petrucci Music Library (known as IMLSP) and MusicBrainz datasets, two crowd-sourced projects. They evaluate the level of linking the two library datasets currently have with each other through the Virtual International Authority File (VIAF), and their current linking status with other libraries contributing to VIAF. They also evaluate the linking status the IMSLP and the MusicBrainz projects currently have with each other and with other datasets.

Thiéblin, Elodie, Ollivier Haemmerlé, Nathalie Hernandez, and Cassia Trojahn. 2020. “Survey on Complex Ontology Matching.” Edited by Marta Sabou. Semantic Web 11 (4):689–727.

Simple ontology alignments, largely studied in the literature, link a single entity of a source ontology to a single entity of a target ontology. One of the limitations of these alignments is, however, their lack of expressiveness which can be overcome by complex alignments. While diverse state-of-the-art surveys mainly review the matching approaches in general, to the best of our knowledge, there is no study taking the specificities of the complex matching problem. In this paper, an overview of the different complex matching approaches is provided. This survey proposes a classification of the complex matching approaches based on their specificities (i.e. type of correspondences, guiding structure). The evaluation aspects and the limitations of these approaches are also discussed. Insights for future work in the field are provided.