EuropeanaTech x AI: eScriptorium

About

How can Artificial Intelligence (AI) and Machine Learning (ML) help us to enrich and research cultural heritage collections? How can ML models be adapted to the complexities of historical and cultural material it has not been trained for? Which biases can we reveal in historical collections? How can we gain new perspectives by using AI in researching cultural heritage data? The EuropeanaTech x AI webinar series discusses these questions with you and presents projects and people who are reflecting on exactly these matters.

This session presents eScriptorium, demonstrating its practical use on a number of case studies while also raising some conceptual issues and challenges around HTR and AI for historical texts and scripts. The eScriptorium project aims to provide a user-friendly and truly open platform for the automatic, semi-automatic and manual transcription of texts from digitised images. Based on current techniques in Machine Learning, it includes trainable models for layout analysis and automatic transcription, as well as (coming soon!) trainable reading order. One can manually segment layout and transcribe in a very ergonomic fashion as well as (coming soon!) annotate text and images. Built on the Kraken HTR engine, it is designed from the start to allow for a very wide range of different languages and scripts, and is currently being used for languages including French, medieval and neo-Latin, ancient Greek, Arabic, Hebrew, Syriac, Old Vietnamese, Old Javanese, Classical Chinese, Georgian. The system is free and open Source Software which can be installed by anyone with the necessary hardware, but also the trained models can be freely exported, shared, and reused, with significant advantages for sustainability and in the reduction of training time needed which in turn brings real reductions in both financial and environmental costs.

This event took place on 2 July 2021, and was followed by a social session on WonderMe to close the EuropeanaTech x AI series.

Speakers

Peter Stokes (EPHE-PSL), ‘eScriptorium for transcribing rare and historical scripts and languages’
Daniel Stökl Ben Ezra (EPHE-PSL), ‘Applying eScriptorium to Hebrew and other RTL-script fragments, manuscripts and books at scale’
Simon Gabay (University of Geneva), ‘Integrating eScriptorium: a pipeline for OCRising historical documents’

Resources

Find out more about the EuropeanaTech x AI webinar series.
Join the EuropeanaTech Community to find out more about research and development for European digital cultural heritage, network with peers and hear about relevant events, resources and opportunities.