2 minutes to read Posted on Wednesday October 26, 2022

Updated on Monday November 6, 2023

Academic Research Open-data

Johanna Monti

Associate Professor , University of Naples L'Orientale

Georgia Evans

Senior Editorial Officer , Europeana Foundation

Team using Europeana data win prize at the EU Datathon 2022

The UNIOR NLP Research Group from the University of Naples L'Orientale were recently awarded a prize in the open data competition, EU Datathon 2022. With their entry using the Europeana.eu dataset, we hear from Johanna Monti, chief scientist of the group, about the app they developed.

The EU Datathon is an annual competition which provides ‘a chance for open data enthusiasts and application developers from around the world to demonstrate the potential of open data, get international visibility for their innovative ideas and compete for their share of the total prize fund of €200,000 and the Public Choice Award.’ They are invited to make use of data.europa.eu, the official portal for European data, managed by the Publications Office of the European Union.

With the Europeana.eu dataset published on data.europa.eu earlier this year, aggregating metadata from the approximately 4,000 cultural heritage institutions that provide content to Europeana, proposals and apps designed for the competition could also benefit from it for their entries. As an official partner of the competition, Europeana invited researchers, university professors and students from Social Sciences and Humanities, and Computer and Information Science to take part in the EU Datathon.

After two rounds of pre-selections of 156 entries from 38 countries, a team that is developing an app based on the Europeana.eu dataset was one the 12 finalists and was awarded a prize of 7,000 euros under Challenge Number 4: ‘A Europe Fit for the Digital Age’ at the award ceremony that took place in Brussels on 20 October 2022. The team is composed of Professor Johanna Monti; researcher, Maria Pia di Buono; and two PhD students, Gennaro Nolano and Giulia Speranza. Johanna Monti tells us about the experience.

Can you tell us about the app that you developed and the process of creating it?

We developed Maggie, a real-time chatbot that functions as a virtual assistant to help people access and discover European cultural content. People can interact with Maggie through natural language questions and ask about European cultural heritage.

The main idea behind Maggie is exploiting Artificial Intelligence (AI) and Natural Language Processing (NLP) methodologies to develop an user centric app which facilitates the access and discovery of multilingual cultural content. The intended audience of Maggie is very diverse; the app tailors content on users’ knowledge and interests to satisfy different information needs, from students to experts.

Maggie is the result of more than a decade of research activities which began in 2012 with our very first experiments in Cross-Language Information retrieval on Cultural Heritage. After that, several milestones marked our way to Maggie, including the establishment of the UNIOR NLP Research group of the University of Naples L'Orientale in 2016, and several several projects from 2019 until 2021, including the SMACH Project (Semantic Multilingual Access to Cultural Heritage), the ArchaeoTerm project which offers a resource of archaeological terms available within the framework of YourTerm CULT project, and the NEAT (Named Entities in Archaeological Texts) project.

Why did you decide to use the Europeana.eu dataset?

Our research group has always been committed to making cultural content easily accessible for everyone, by developing systems and applications for cultural heritage. In this sense, we have already exploited European open data (in the form of data from the Europeana website) in several works, all aimed at improving current state-of-the-art in Natural Language Processing tasks for better access to cultural heritage content.

In all these cases, the core of the data we used was represented by open data scraped from the Europeana Search API, which makes it easy for aggregated data to be accessed and reused, while also ensuring the high quality of the data, and their multilinguality. While in previous experiments much of the information described by the Europeana Data Model (such as data about localisation, authors and themes) was not used, to develop Maggie, we fully exploit the rich source of information offered by Europeana, as we aimed to develop a more specific Natural Language Processing task.

The EU Datathon encourages use of open data sets. Why is openness of data important to your research and app?

Open data ensures reproducibility and transparency in research. The availability of such data represents a way to encourage knowledge sharing and cooperation in scientific communities. Most of our research efforts take advantage of open data from several sources. This is the case of our app Maggie. Without open data from Europeana and data.europa.eu, we couldn’t have developed Maggie. We extract information about each artwork made available through Europeana, such as its author, creation date and so on, and we aggregate the information about its geolocation from the GeoDataset of data.europa.eu.

Why did you decide to enter the EU Datathon competition?

It was a big challenge for us as we tried to gather all our previous efforts in one single application which could help people easily access European cultural content in today’s digital age. However, it also represented an opportunity to get out of pure academic research, and commit in a proof of concept which goes beyond the prototype stage, towards something which might actually be used in a real-world situation; all while making use of state-of-art methodologies, resources and tools in Natural Language Processing and Artificial Intelligence.

What advice would you give others entering a competition like this?

Joining competitions which promote the use of open data is a way of supporting the implementation, spread, and adoption of such data. It also contributes to the improvement and maintenance of datasets which, due to the amount of data and sources, are difficult to manage, clean, and test. The results of these types of competitions have a real impact on society, directly related to the possibility of improving the quality of life of citizens, by making information and knowledge about the society they live in accessible and readily available. Our advice to researchers is to get out of their comfort zone, and to combine the rigour of research with the creativity of the design process, thinking of the beneficial impact on society as the final objective.

Find out more about Maggie