Today's blog introduces you to a great European Commission-funded project called Preserving Linked Data, or 'PRELIDA' for short. The project launched in January this year and is now looking forward to its first major working group meeting in June. A great opportunity, we thought, to tell you what it's all about.
The project aims to build bridges across the digital preservation and linked data communities, with the view of
- making the linked data community aware of existing outcomes of the digital preservation community; and
- working out the challenges of preserving linked data, posing new research questions for the preservation community, and developing a roadmap for addressing them.
The sheer amount of data offered and consumed on the internet, and the volume of data being digitally stored and exchanged, is growing exponentially. This generates the potential for many new types of products and services, and a whole new industry implementing services on top of large data streams. The impact of this emerging economic sector - the data economy - may soon outrank the current importance of the software industry.
Carlo Meghini, project coordinator and long-standing partner on Europeana projects, says, 'An important part of the data economy is the linked data movement, which is about using the web to connect related data that was previously not linked, or using the web to lower the barriers to linking data. With the increasing adoption of the linked data paradigm by governments and organisations, the requirements in terms of quality, usability and maturity increase. In order to continue to develop and increase uptake of linked data as a platform for publishing open data, we need to address the issues surrounding preserving linked data. For example, they are different to other data sets in that they are in RDF, they use URIs as identifiers, and they rely on shared vocabularies. Whilst these are all good features of linked data, they are likely to create preservation problems that other types of data do not have.'
Carlo Meghini, project coordinator. Image taken from a video lecture by Carlo on data preservation.
The PRELIDA team is convinced that the preservation problem of linked data can only be solved satisfactorily if the digital preservation and linked data communities come together with their complementary skills and technologies. So an important task of PRELIDA is to raise awareness of existing preservation solutions and to facilitate their uptake.
The project will produce a report on the current state of linked data and its preservation needs, and will develop a roadmap focusing on the most promising research paths, and the resulting problems to be addressed. This research will drive the scientific and technological development of the field, as well as future research programmes that the European Commission may wish to fund.
The challenges of preserving linked data are expected to be related to intrinsic features of linked data, including their structuring, interlinking, dynamicity and distribution. PRELIDA will implement its aims through a coherent collection of activities, including a working group, open consultations, holding three dedicated workshops, two summer schools, and a broad dissemination action, addressing the scientific community, technology providers, key user groups, and policy-makers.
The main partners for the project are CNR-ISTI (Italian National Research Council of the Institute of Science and Information Technology), APA (the Alliance for Permanent Access), the University of Huddersfield in the UK and the University of Innsbruck in Austria. As a leader in linked data thinking, Europeana is involved with PRELIDA in a sub-contracting capacity.
Members of PRELIDA are looking forward to the first workshop of their working group. The group is made up of around 20 world experts and has been formed to help PRELIDA achieve its goals. It will meet three times during the course of the project. The first meeting takes place in Tirrenia, Italy on 25-27 June. Then in September, the project will be represented at the European Semantic Web Conference (ESWC) Summer School in Kalamaki, Crete, running seminars on the topic of preserving linked data. We look forward to reporting on the discussions and developments that come out of these events.
By Ingrida Vosyliūtė, Coordinator of Hack4LT and Project Manager at Vilnius University Faculty of Communication.
Vilnius University Faculty of Communication and Vilnius University Library recently hosted the first cultural heritage and digital humanities hackathon in Lithuania - Hack4LT. The event was inspired by Lithuania's co-operation with both Europeana with its significant multilingual online collection of digitised cultural heritage, and DARIAH: Digital Research Infrastructure for Arts and Humanities, a major European digital humanities network.
The 2-day event took place in a recently opened National Open Access Scholarly Communication and Information Centre – the most modern library in the Baltic countries. It started symbolically on 4 April, also known as the day of St. Isidore of Seville, who is a declared patron of the internet, computers and computer users.
Hacking is fun! Photo - Darius Verseckas.
Hack4LT aimed to foster collaboration between scholars of digital humanities and software developers. The event encouraged technology-driven experimentation with existing Europeana datasets. Open access to this resource stimulates a broad public interest in European culture and challenges cultural institutions to seek new ways of engaging people and developing innovative tools. Because of the richness of Europeana's collections and the nature of preserved digital content, it is a valuable data source for digital humanities researchers and can enhance digitally-enabled research.
The hackathon brought together 20 young software developers, who were encouraged to try out their ideas for creative re-use of Europeana content in order to build applications showcasing the social and scholastic value of open cultural data. Two 500 EUR prizes were available for the best prototypes meeting the needs of digital humanities and the general public.
Dr E. Champion from DARIAH with the 'Manuscript' team, Digital Humanities category winners. Photo - Darius Verseckas.
The hackers formed small teams and worked on ideas they had discussed beforehand. Hacking ran till late in the evening with a few enthusiasts staying awake all night.
The 2 days of hacking resulted in 3 prototypes. After presentations of the results on the second day, the prototypes were judged by the jury of 7 experts.
The best prototype in the digital humanities category was ‘Crowdhwr’, developed by team ‘Manuscript’ (A. Gimbutas, J. Sadzevičius & M. Zimnickas). They created a crowdsourcing manuscript transcription system, tested using examples from Europeana. The prototype allows users to mark words in a digitised manuscript and prepare it for analysis. The winners were happy with the results and are planning to continue developing this prototype. Their goal is to create a tool allowing users to convert image to text in order to perform automatic search of a manuscript’s content.
The best prototype in the general public category was ‘Gamepad 2.0’, developed by team ‘CodeUnited’ (S. Mikalonis, K. Rutkauskas, M. Sorokin & M. Ūba). The team created a fun, educational quiz game, which uses Europeana data to generate questions, concerning various aspects of Lithuanian history, art and culture. The quiz encourages players to compete with each other by giving answers in a limited amount of time.
'CodeUnited' team, the winners in the General Public category. Photo - Darius Verseckas.
A consolation prize was also given to the third prototype developed by I. Bačius, M. Baranauskas, J. Jaronis & I. Pliavgo. They created a Europeana plugin, which can be set up in databases and web portals that use the Django framework. While using an existing search of digital objects, the plugin links the search with Europeana's data and shows similar results found on the Europeana portal.
Hack4LT participants with the rector of Vilnius University, prof. habil. dr. J. Banys (centre). Photo - Darius Verseckas.
The rector of Vilnius University, prof. habil. dr. J. Banys, congratulated the participants of Hack4LT saying: ‘You are a revolutionary part of our society, having so many great and fresh ideas. I am glad that these ideas matter. Moreover I hope there will be more of them in the future.’
Last week, EuropeanaTech released two major new documents. Today's blog looks at the work and the people behind one of them, interviewing Maarten Brinkerink and Marlies Olensky.
First of all though - a quick look at the two new documents.
The first, titled, 'Core Inventory of FLOSS in the Cultural Heritage Domain, second iteration' analyses the Free/Libre and Open-Source Software landscape and provides a baseline for the development of innovative applications in the Europeana Network.
The second, called 'Functional specifications for social semantic functions' is the first step in a process of building two prototypes that will articulate user-generated metadata with semantic functions in Europeana v2.0's R&D work package. It provides functional specifications and a description of prototypes. To find out what that's all about, we've spoken to two of the minds behind it.
Maarten Brinkerink and Marlies Olensky
What is your day-to-day role/where do you work?
Maarten: I'm a project manager for Research & Development at the Netherlands Institute for Sound and Vision, mainly working on projects that aim to provide meaningful access to digitised audiovisual heritage.
Marlies: I'm a researcher at the Berlin School of Library and Information Science (Humboldt-Universität zu Berlin) where I work on the Europeana v2.0 project. I've previously been involved in another Europeana project: Europeana Connect where I worked on the semantic data layer (2009-2011). I'm also doing my PhD, which is not related to cultural heritage but is about data quality in bibliometric studies.
What is your involvement with Europeana?
Maarten: For the Europeana v2.0 project, Sound and Vision works on several tasks within the 'innovation' work package, including one on the developers' network and FLOSS inventory and one on the development of innovative apps. The aim of the work package is to foster a research and development community around Europeana, to stimulate innovation that benefits the projects, Europeana Network and the broader cultural heritage domain.
Marlies: Like Sound and Vision, Humboldt University is a partner in the innovation work package, responsible for the Semantic Web & Linked Data and multilinguality tasks. The aim of the Semantic Web & Linked Data task is to make Europeana more 'semantics aware' and to integrate it into the emerging paradigm of Linked (Open) Data.
Tell us about the 'Functional specifications for social semantic functions' work. What problem are you trying to solve?
Marlies: The aim of the task was to demonstrate and try out some options for social semantic web functions that could be useful for Europeana. The semantic web is basically the idea of turning the web into a web of data that can be processed by machines, so it adds machine-readable metadata to human-readable web documents. The social part here means that we would like to employ the users to make this vision happen. In other words, we looked for ideas for what or how the user can contribute to existing content by tagging, correcting, or organising objects or their metadata. So, we tried to come up with innovative functionality that can be tested out using the prototype tools we developed.
Maarten: The functional specifications for social semantic functions and prototype code builds on earlier research done within the work packages on identifying open source tools for the cultural heritage domain (led by Sound and Vision) and the social semantic web (lead by Humboldt). It describes how two selected open source tools (the Waisda? Video Labelling Game and Crowdcrafting) can be further developed to support metadata enrichment via crowdsourcing. Waisda? is a crowdsourcing video annotation platform that has been released as an open source framework by Sound and Vision. Crowdcrafting/PyBossa is a platform for creating and running crowdsourcing applications that utilise online assistance in performing tasks that require human cognition, knowledge or intelligence such as image classification, transcription, geocoding and more.
Screenshot from Crowdcrafting/PyBossa
What was your involvement in this work?
Marlies: In the beginning, I researched what social semantic web solutions are already out there. Then we had several brainstorming sessions to discuss possible functionalities for the two selected tools. We had a very good and close cooperation between Maarten and his colleagues at Sound & Vision and myself which in the end led to a very satisfying document on functional specifications. Sound & Vision was then responsible for developing the prototypes.
Maarten: Sound and Vision supported Humboldt University in writing the functional specification and further developed the tools in the form of prototypes that showcase the functionality that is described in the deliverable.
What challenges did you encounter?
Maarten: When setting up these open source tools for datasets on Europeana, we noticed that only a few data providers are linking out to their digital objects in their metadata. We set out to make tools that could actually present rich content - instead of only metadata - to the users during the crowdsourcing tasks. Another challenge was to find suitable controlled vocabularies to support users in describing the material. Ideally, these vocabularies should fit with the content, be available in SKOS, be licensed for free re-use and be multilingual.
Marlies: For me, the main challenge was the one Maarten already mentioned: As we (along with data providers) employ controlled vocabularies to describe the information objects in a standardised way so as to make them retrievable. It was important to find suitable ones that match the terms that users would want to use to describe objects. Another challenge was narrowing down the functionality options we wanted to try out first and identifying those that seemed most feasible.
What have you achieved?
Marlies: Well, I think we've created an important showcase by demonstrating what kind of social semantic web functions could be leveraged to improve or augment Europeana content.
Maarten: By creating prototypes based on the functional specifications written by Humboldt, Sound and Vision was able to further develop two crowdsourcing tools, enhancing the opportunities to re-use them in the Europeana context. As a result of this work, it is now relatively easy for Europeana data providers and/or projects to set up their own instance of either or both tools. A European data importer was built for both tools, using the Europeana API, so others will be able to set up their own crowdsourcing tasks.
What are the next steps?
Maarten: Next we will further develop the prototypes to encompass all functionality that is described in the report and make the code available to the Europeana Tech community, allowing for Europeana projects and data providers to set up their own instances of the tools to enrich their metadata via crowdsourcing. The European Film Gateway has already shown interest in re-using the Waisda? Video Labelling Game technology.
Marlies: Indeed, and Humboldt will support Sound & Vision in this second development phase where needed.
Screenshot from Waisda?
What response are you looking for from readers?
Maarten: We look forward to feedback on the specifications, suggestions on further developing the prototypes and pointers to controlled vocabularies that could be used for a next iteration.
How can people contact you with their feedback?
Maarten: They can send me a message through Twitter: @mbrinkerink or drop me a line at mbrinkerink [at] beeldengeluid [dot] nl.
Marlies: You can get in touch with me at marlies [dot] olensky [at] ibi [dot] hu-berlin [dot] de.
Finally, a little something beyond Europeana. What do you do when you're not working?
Maarten: When I'm not working I write and record music. I used to perform quite a bit as well, but I'm currently in between bands. I'm also a volunteer for Wikimedia Netherlands, helping them to join forces with cultural organisations.
Marlies: At the moment, I don't have much free time, as most of my time is dedicated to my PhD research. But I do need some balance which I find in travelling, practising yoga and spending time with my family and friends.
What is your favourite item from Europeana?
Maarten: 'Het melkmeisje' by Vermeer. I chose this item not only because is it is a visually compelling and quite iconic piece of art. But for me it is also a best-practice example of how cultural heritage is accessible through Europeana. The Rijksmuseum has really raised the bar for providing open access to art collections on an international level. Their metadata is available through an open API under a CC0 licence, allowing for aggregation by, among others, Europeana. They explicitly mark the works in their collection that are in the public domain as such, using the Public Domain Mark and, last but not least, they provide links to beautiful high resolution digital representations of the artworks.
The Milkmaid, Vermeer. Rijksmuseum, public domain
Marlies: My favourite item in Europeana is a radio recording that is a curiosity about my home country, Austria and my current country of residence, Germany. It's a recording from 1978 when Austria beat Germany at the soccer world cup in Cordoba, Argentina, (for the first and only time in a world cup). And believe me, everyone in Austria is still proud of that victory and nobody in Germany even remembers that match any more! Listen to the recording.
In the latest in our series of blogs following the progress of aggregation schemes for Europeana, we hear how our team visited Slovenia, talked masterpieces, and found enlightenment! David Smith writes...
On 21 March, the Europeana Business Development and Ingestion teams ran a workshop in conjunction with the National Library of Slovenia in the city of Ljubljana.
The National Library has been aggregating material to Europeana for many years via its Dlib.si portal. In the past year, the National eContent Aggregator was initiated as a cross-domain national aggregator.
Before the workshop, we were lucky enough to be shown around the impressive National Library building, designed by Slovenia’s most famous architect, Jože Plečnik. Through the building's imposing brass front doors, the potential reader is led up a dark, marble-pillared staircase towards the light-flooded, wooden-panelled reading room. To access collections here, you literally have to walk up towards enlightenment.
Main Entrance Staircase of the National and University Library by Jože Plečnik, 2009. Slovenian National E-content Aggregator, CC-BY-NC.
The workshop itself was held at another site on the outskirts of the city. Around 20 participants engaged in a day of discussions about Europeana and aggregation in Slovenia. After an introduction, Annette Friberg from Europeana gave a presentation on Europeana's current and future plans. This included development of Europe’s network of aggregators and some current interesting projects. Zoran Krstulović then introduced the National eContent Aggregator project including technical details regarding how the system is working. Zoran’s presentation was in Slovenian and created quite a buzz of discussion with the attendees.
Following this, I presented Europeana’s key areas of development: audiovisual material, masterpieces and the API. A common trend across all the places I have done my presentation is the reception of the audience when presented with a potential list of their country's masterpieces. The requirement for the availability of digitised masterpieces was a recommendation of the EC’s New Renaissance paper and something we have been working on for the past year. Mobilising national aggregators and other organisations to make these available is a priority area of Europeana but not without its pitfalls. The Slovenian audience was receptive to the idea, which was reassuring to us.
Francesca Morselli from the Europeana Ingestion team then presented and ran a workshop on metadata quality and rights statements. Rights statements are a key focus for Europeana in 2013 and the workshop gave us an opportunity to work through issues associated with providing accurate rights statements for content coming into the Europeana portal.
After lunch, the final workshop looked at simplifying the process of making material available via Europeana. During this workshop, we discussed the possible benefits of the work being done on the Europeana Inside project. The audience clearly saw the benefits of that project and how it could relate to their own work.
As with all workshops, the end of the day was marked with very quick goodbyes and next steps before the rush back to the airport. Throughout our time in Slovenia, we were assisted by the National Library of Slovenia, to whom we are very grateful for their hospitality and organisational support.
More blogs about aggregation:
Finland's National Digital Library Formula for Success
Working Towards a Bulgarian National Aggregator
We have published another case study from our series of real-life examples on the use of open data that were presented at Europeana’s Open Data Case Studies workshop in Paris earlier this year. This week we are looking at how the Netherlands Institute for Sound and Vision are measuring the impact of opening up their datasets via Open Images, an open media platform that provides online access to audiovisual archive material to stimulate creative re-use.
Maarten Brinkerink, Project Manager for R&D at the Netherlands Institute for Sound and Vision, presenting at Europeana’s Open Data Case Studies Workshop in Paris.
To learn a little bit more about Open Images, we caught up with Maarten Brinkerink, Project Manager for R&D at the Netherlands Institute for Sound and Vision.
What is the ‘big idea’ behind this project?
MB: Together with Kennisland, we set up Open Images in 2009 with the aim of facilitating re-use of the collections we hold, along with content from individuals and collections from other institutions. Access to the material on Open Images is provided under the Creative Commons licensing model or a Public Domain Mark. This enables the freedom to approach copyright in a more flexible manner and make work available in a way that encourages re-use.
How did the project create value for users of Open Images and the institutions involved?
MB: Open Images is accessible to anybody who wants to upload their own material and assign an open licence to it to encourage re-use. We’re not just about institutes and producers, but all ‘netizens’ who create material and want to enable the re-use of it via Open Images. Wealso provide an API, which enables developers to easily re-use material and create mash-ups. After launching Open Images, material was almost immediately re-used within several projects.
What benefits have resulted from Open Images?
MB: Aside from traffic and usage figures on Open Images increasing, we have also seen the external re-use of material increase as well. The Sound and Vision videos from Open Images are, for instance, also available on Wikimedia Commons and in Europeana. This is facilitated by the open infrastructure of the Open Images platform, which effectively distributes open content by combining open source software components, open media formats, open standards and an open API.
MB: In response to the growing need within the cultural heritage field to receive statistics on the impact of the opening up of cultural data sets, Sound and Vision will perform impact analysis research together with Kennisland for Open Culture Data. In order to do so, the data providers from the Open Culture Network, along with international initiatives, are requested to provide data on the impact and re-use of their data sets. The results of this impact analysis will be made public in the course of 2013.
You can learn more about Open Images and its results from the embedded case study below, or alternatively you can download it.
If you are interested in learning more about Open Images or have questions related to measuring the impact of cultural datasets you can contact Maarten Brinkerinkon Twitter (@mbrinkerink) or directly through email email@example.com.
We have already published three other case studies from the Statens Museum for Kunst - the National Gallery of Denmark (SMK), the Polish Digital Libraries and Europeana – so check them out if you haven’t already done so. We will soon be publishing our final case study in this series next week. The final case study from the BBC, along with the others, will then form a larger white paper on open data, due to be published later this spring.