The Virtual Language Observatory
CLARIN is a research infrastructure that aims to support researchers in the humanities and social sciences by making digital language resources and tools from all over Europe and beyond accessible through a single sign-on online environment. As partners in the Europeana Digital Service Infrastructure (DSI), Europeana and CLARIN are working together to embed cultural heritage content into CLARIN’s infrastructure. Since an initial pilot integration in 2017, CLARIN has regularly updated and extended the selection of cultural heritage objects it includes in its Virtual Language Observatory (VLO). This online search and discovery service focuses on the needs of scholars looking for language resources, and is integrated into the wider CLARIN infrastructure.
New resources for researchers
A key part of this integration is improving user access to online analysis and processing possibilities for any resource found through the VLO. Such functionalities are available for a wide variety of cultural heritage resources 'harvested' through Europeana, ranging from renaissance era manuscripts and digitised newspapers to historical children’s books and oral history recordings.
In April 2019, we wrote about the first resource integration. We showed a powerful example of how people can process a language resource directly from their browser with a few clicks after discovering it. At that point, about 135,000 records had been sourced from Europeana and included in the VLO. Since then, we have carried out two additional iterations of selection and integration, resulting in over 275,000 records from Europeana, which is more than any other individual provider of metadata records currently in the VLO. Below, we present two additional examples of resources that are currently available, and demonstrate how they can be processed further.
‘O kimmeryjskich pomnikach w Krymie’
'O kimmeryjskich pomnikach w Krymie', is a Polish book from 1882, provided by the Federacja Bibliotek Cyfrowych as a PDF, with its full text content available as a result of OCR (optical character recognition). As the animation below shows, someone using the VLO can explore processing options by selecting a link to an individual file and processing it with the Language Resource Switchboard. For this record, a variety of interesting natural language processing tools are available, most of them provided by the Polish CLARIN-PL consortium.