Europeana Common Culture webinar: Semantic Enrichment Strategy at SearchCulture.gr

About

Aggregators deal with a mass of heterogeneous data coming from multiple cultural heritage institutions. In the process of cleaning, harmonising, enriching and organising data, they follow different strategies. How this data is handled by each institution, plays a decisive role on the (multilingual) search and discovery functions they can offer to their users.

This webinar explores the methodology followed by the Greek National Aggregator SearchCulture.gr for its data enrichment processes, as well as sharing the advantages and disadvantages of the approach and lessons learned along the way. Participants in the webinar also compare and discuss other different enrichment strategies and gather input which will inform future strategies.

This webinar took place on the 23 June 2020 11.00 - 12.00 CEST.

Speakers

Haris Georgiadis, SearchCulture.gr Technical Lead (Head of e-Services Department, National Documentation Center)
Agathi Papanoti, Data Ingestion Specialist (National Documentation Center)

Resources

The slides of the webinar can be found here.

Questions from participants

How do you treat the source metadata schema? Do you ask the content provider/curator to put the relevant keywords and information in dc:type? And then he/she operates the tool?

Most of our providers documented their content in DC (oai_dc, ese, edm) via OAI-PMH. For those that don’t use a specific schema we perform an initial metadata mapping to the EDM elements before the aggregation (our infrastructure supports web crawling and scraping from website pages and thus allows us to do that). The enrichment tool is operated by the EKT curator.

Do you want to enrich Place and Actors (persons or organisation)? If not why not? If ‘yes’ how would you do it?

Both Place and Agent enrichments are our iminent developments. We aim to enrich spatial metadata with Geonames or Wikidata. We will also conduct enrichments based on dc:creator, dc:contributor and dc:subject using an extensive file of agents (historical figures, actors, politicians etc).

Do you foresee a future where machine learning can be used to automate the manual mappings even further. If so, do you have projects or R&D planned in this area? Who else is working in this area?

We could use machine learning techniques in the future in combination with the rest of our methodology, however:

The large variety of subjects and material
The large number of subjects in our two subject vocabularies (app. 2000)
Poor original documentation

will hinder machine learning (it will be difficult to feed with enough corpus data for each voc subject).

Do you allow providers to review and possibly approve your enrichments?

So far this has not happened. The EKT enrichments are being presented in distinct fields and the original metadata is not altered. However we are open to providers’ suggestions.

When you link a historical period to specific dates, how do you deal with the issue that maybe a period is applied to different regions at different chronologies? Examples: Bronze Agein the Aegean, Ottoman period in Crete.

Some periods have a strict local scope (e.g. minoan, cycladic and helladic periods) and as a result their year ranges tend to overlap. We call those periods relative. The rest of the periods cover the entirety of Hellenic territory and are less debatable with respect to their timespans. We call those absolute. In our vocabulary, absolute periods have neither overlaps nor gaps when they have the same parent and relative periods have at least one absolute ancestor. When the provider has set a “relative” time period for their content we follow their lead. When we can’t be sure about the locality we assign the corresponding “absolute” period.

Does your system include skope notes that explain the use of the terms in EKT vocabularies?

So far we have not added skope notes in our vocabularies. However most of the terms in our “type” vocabulary have bilingual skos:definition fields.