How the Sandbox is evolving to meet the needs of the data space for cultural heritage

As our January Europeana Pro News focus has shown, the Metis Sandbox meets a real need by allowing aggregators and cultural heritage institutions to run a full ingestion workflow on their data. This means that they can test their data, review the resulting records and get immediate feedback on data quality, allowing them to resolve any issues before submitting it to Europeana.

Additional quality feedback comes in the form of a full report on tier calculations (for more on our data quality tiers see the Europeana Publishing Framework) and the signalling of field warnings (patterns in the data that might indicate issues to resolve). Recently, training resources and an extensive user guide have been developed for the Sandbox and the application is gaining traction as a diagnostic tool for data partners.

In this final article of the Metis Sandbox focus, we’ll look to the future. The Metis Sandbox has been a key component of the aggregation strategy since its inception, and is a core product of the Europeana Initiative.

We have lots of plans and wishes for the application that we hope to realise in the months and years to come. But, in software development as in other fields, we’ve got to make sure that our foundation is strong. Some attention will need to be devoted to consolidating the application after the recent period of functional growth.

How are we consolidating what we already have?

The main driver for our consolidation efforts is that of expected demand. As more and more people will be starting to use the Sandbox, we expect that, unless we take action, dataset queues will start to develop and people will have to wait longer for their datasets to be processed. A bit of queuing is of course unavoidable in an application like this and comes and goes with variations in demand. But it should be contained to acceptable levels and not lead to errors or a bad user experience. To this end, we’re working on the capacity of the application to scale up, as well as on optimising its data processing component. Additionally, we’re considering ways in which we can signal more clearly to the user that even though there is currently a queue, their data is on its way.

Another thing we’re working on is the user experience in general. The Sandbox interface has grown organically along with the functionality, and, even though it has received a lot of attention from our developers, has never been designed holistically. Now that the functionality is becoming established, it is a good moment to look at the visual design. Our design team has already begun to bring the interface in line with the Europeana house style and find ways to enhance the user experience. We’ll be conducting user feedback sessions to help us in this endeavour.

What new functionality will we be adding to the Sandbox?

There are a lot of things we are or will be working on that we hope will make the life of our data partners better. For instance, people who use the Sandbox often comment that, given that the Sandbox processes the first 1,000 records it finds in a dataset, the feedback it gives is not always representative of the whole set. We’re working to allow the user to set a sampling offset, which should enable them to get the Sandbox to process a representative sample by selecting records more evenly across the dataset.

Another request we occasionally get is for extended tier calculation reports that also contain statistics covering the whole dataset sample instead of just the details for one record in isolation. This too is something we will be working on in the near future.

Feedback from aggregators tells us that the functionality that the Sandbox offers would be useful at earlier stages of aggregators’ workflows, specifically when preparing and mapping the data. We are working to address this by making some functionality available as an API, which means that people can use it at any point in their aggregation process.

Finally, we’re adding to our repertoire of field warnings (i.e. problem patterns). The current list consists of eight types of warnings, all to do with record titles and descriptions. New field warnings could cover more and different fields, and therefore give more comprehensive feedback to Sandbox users. The Data Quality Committee will work on selecting and prioritising these from a list of options, and we will then implement them in the Sandbox.

What about the medium and long term?

As we are building the common European data space for cultural heritage, the Europeana Initiative is discussing and designing improvements and enhancements to the Europeana Data Model and the Europeana Publishing Framework. Developments there may have to be reflected in the Metis Sandbox; for example, we expect some improvements in the handling of 3D records, which is identified as a priority for the data space. Future updates to the aggregation strategy will also be reflected in the Metis Sandbox and hopefully improve the data publication journey for cultural heritage institutions and aggregators.

Find out more

This is the last news post of our January Europeana Pro News focus on the Metis Sandbox. We hope that this series has given you an idea of what the Sandbox can do now and what it will do in the future. We look forward to hearing from our Sandbox users: the application has a feedback functionality we encourage you to use. Meanwhile, please keep reading Pro News for more on other fascinating topics!