Full-Text Resource Processing Training Workshop

This training workshop from Europeana and CLARIN will demonstrate the use of Jupyter notebooks to education professionals working in an academic context.

About

Jupyter notebooks are an excellent introduction to the essential principles of large-scale data processing, whether in a local environment or in computational environments available in today's cloud-based ecosystems for open science. In this workshop, organised by CLARIN ERIC and Europeana Research, participants will discover how Jupyter notebooks can be used to explore and process textual resources with publicly available natural language processing (NLP) tools. We will use resources from the Europeana Newspapers Collection, comprising full-text content from more than 60,000 historical newspaper issues from eight countries covering 19 different languages. CLARIN centres offer a variety of NLP tasks as a service that can be applied to text resources,such as named entity recognition, topic modelling and part-of-speech tagging.

In this training workshop, we demonstrate the use of Jupyter notebooks to education professionals working in an academic context and provide them with initial hands-on experience adapting and extending pipelines for NLP processing of text resources. Participants are guided through Jupyter notebooks that select and pre-process resources making use of metadata, run an NLP task on the selected resources, and further process and present the results. In the interactive part of the workshop, participants learn how to make adaptations to existing notebooks and discover how to tweak and extend a notebook to any specific study and research question.

Europeana and CLARIN benefit from a long-standing partnership that, over the years, has led to the harvesting of over 200,000 Europeana items into the CLARIN Virtual Language Observatory. The next goal is to make text resources readily available for linguistic analysis and processing in research and higher education contexts. To this end, resources from the Europeana Newspapers Collection will be catalogued in the SSHOC Open Marketplace along with processing examples and other training material.

This training workshop is organised by Twan Goosen (CLARIN ERIC), Michał Gawor (CLARIN ERIC), Iulianna van der Lek-Ciudin (CLARIN ERIC ), Alba Irollo (Europeana Foundation) in the context of the Europeana DSI-4 project.

Register

The registration to the event is limited to 16 participants, a waiting list will be available for those who register after all places have been booked. Registered participants will be asked to confirm their attendance and will receive further instructions a few days before the event.

Registrations will be closed on 13 June 2022. Register now.

Target audience

Education professionals in academia with an interest in, but limited experience with, programmatic approaches to NLP using Jupyter notebooks and online text resources

Prerequisites

Basic understanding of computer programming concepts
There is no need to install anything on your own computer, and there are no special requirements to the available hardware or software in order to take part.

Learning outcomes

By the end of this tutorial, participants will know how to:

Use the basic features of Jupyter notebooks (accessing, logging in; running individual steps, understanding what steps are doing; editing code in a notebook; making adaptations of an existing notebook (copy, edit)
Use metadata to select and pre-process text resources available within the processing environment
Execute NLP tasks with local resources and remote services
Post-process and present the results of NLP tasks