Europeana Common Culture webinar: Semantic Enrichment Strategy at SearchCulture.gr
Aggregators deal with a mass of heterogeneous data coming from multiple cultural heritage institutions. In the process of cleaning, harmonising, enriching and organising data, they follow different strategies. How this data is handled by each institution, plays a decisive role on the (multilingual) search and discovery functions they can offer to their users.
In this webinar Haris Georgiadis, SearchCulture.gr Technical Lead (Head of e-Services Department, National Documentation Center) and Agathi Papanoti, Data Ingestion Specialist (National Documentation Center) introduced the methodology followed by the Greek National Aggregator SearchCulture.gr for its data enrichment processes and shared the pros and cons as well as lessons learnt along the way.
The participants in the webinar, be they representatives of advanced aggregators with established enrichment workflows, will have the opportunity to compare and discuss the pros and cons of the different enrichment strategies. On the other hand, representatives of less advanced aggregators or of other cultural institutions had the chance to gather input that will inform their future enrichment strategies.
Questions for participants
How do you treat the source metadata schema? Do you ask the content provider/curator to put the relevant keywords and information in dc:type? And then he/she operates the tool?
Most of our providers documented their content in DC (oai_dc, ese, edm) via OAI-PMH. For those that don’t use a specific schema we perform an initial metadata mapping to the EDM elements before the aggregation (our infrastructure supports web crawling and scraping from website pages and thus allows us to do that). The enrichment tool is operated by the EKT curator.
Do you want to enrich Place and Actors (persons or organisation)? If not why not? If ‘yes’ how would you do it?
Both Place and Agent enrichments are our iminent developments. We aim to enrich spatial metadata with Geonames or Wikidata. We will also conduct enrichments based on dc:creator, dc:contributor and dc:subject using an extensive file of agents (historical figures, actors, politicians etc).
Do you foresee a future where machine learning can be used to automate the manual mappings even further. If so, do you have projects or R&D planned in this area? Who else is working in this area?
We could use machine learning techniques in the future in combination with the rest of our methodology, however:
- The large variety of subjects and material
- The large number of subjects in our two subject vocabularies (app. 2000)
- Poor original documentation
will hinder machine learning (it will be difficult to feed with enough corpus data for each voc subject).
Do you allow providers to review and possibly approve your enrichments?
So far this has not happened. The EKT enrichments are being presented in distinct fields and the original metadata is not altered. However we are open to providers’ suggestions.
When you link a historical period to specific dates, how do you deal with the issue that maybe a period is applied to different regions at different chronologies? Examples: Bronze Agein the Aegean, Ottoman period in Crete.
Some periods have a strict local scope (e.g. minoan, cycladic and helladic periods) and as a result their year ranges tend to overlap. We call those periods relative. The rest of the periods cover the entirety of Hellenic territory and are less debatable with respect to their timespans. We call those absolute. In our vocabulary, absolute periods have neither overlaps nor gaps when they have the same parent and relative periods have at least one absolute ancestor. When the provider has set a “relative” time period for their content we follow their lead. When we can’t be sure about the locality we assign the corresponding “absolute” period.
Does your system include skope notes that explain the use of the terms in EKT vocabularies?
So far we have not added skope notes in our vocabularies. However most of the terms in our “type” vocabulary have bilingual skos:definition fields.