T10/S04: The FAIR Reuse of Archive Data

Format: Paper presentations with discussion

Convenors:

Stephen Stead, Paveprime Ltd, UK, steadsds@outlook.com

Jane Jansen, Arkeologerna/Intrasis, Sweden, Jane.Jansen@arkeologerna.com

This hopes to start a dialogue concerning the reuse of archival data. We are particularly interested in the use of oral history and traditions and how they may be incorporated with excavation material.

The archaeological research community was an early adopter of digital tools for data acquisition, organisation, analysis, and presentation of research results of individual projects. (Richards 2022). As several projects have shown, digital data can be shared, but how can those data be used? To address those questions, principles and ontologies have been created and are ready to be applied.

One such concept is FAIR data. FAIR data is data which meets the principles of Findability, Accessibility, Interoperability, and Reusability (FAIR). The acronym and principles were defined in the journal Scientific Data in 2016.

Digital archive access projects will revolutionise archaeological research and are vital if we want to attain the R in FAIR. However, it is necessary to apply an ontology to the data, otherwise the time needed to understand the semantics of each dataset is insurmountable. CRMarchaeo, an extension of the CIDOC CRM, is one way to link a wide range of existing documentation from archaeological investigations. It was created to promote a shared formalisation of the knowledge extracted from archaeological observations. It provides a set of concepts and properties that allow clear explanation (and separation) of the observations and interpretations made, both in the field and in post-excavation.

Using FAIR principles is critical to the creation of wider pictures of regions or periods and can also be a stepping stone to generating Big Data for further analysis.

In this session we invite presentations from organisations or projects who are addressing these issues. We are particularly interested in applications of the CIDOC CRM and its extension CRMarchaeo.

Richards, J. 2022 Presentation at CHNT Vienna

Papers:

True Integration: Moving from Just Finding Archives to Interpreting Archaeological Documentation Utilising CRMarchaeo

Jane Jansen, Intrasis, Sweden
Stephen Stead, Paveprime Ltd, UK

This integration study considers progress on the integration of Swedish archaeological excavation databases using the family of extensions of the CIDOC Conceptual Reference Model (CRM). The core of the project is to integrate the 2500 current Intrasis database instances and to enable all future Intrasis instances to be integrated without manual intervention. The requirement is to move beyond the traditional GIS based gazetteers, that allow the discovery of which archives/databases are about the correct type of site or that are in the right geographic area, and provide full search access to the original site documentation. This will provide more opportunities for innovative intra- and inter-site research as the effort required to discover appropriate material is reduced.

The presentation showcases the work undertaken by Intrasis and Paveprime to make the original site archive data accessible without resource hungry recasting and harmonisation. The most recent phase of the work has concentrated on allowing multiple interpretations to be made accessible to researchers and the linking of oral history and traditions to site archives. Technical innovation has concentrated on presenting the material using the enhanced functionality possible with RDFS*.

Digital Archiving and the Dissemination of Archaeological Records in South Asia 1900-2000

Mohsin Ali, Muhammad Nishat Hussain, Laiba Munir and Atif Azhar Ahmad, Institute of Global and Historical Studies, Government College University Lahore, Pakistan

The use of digital tools in archaeology is updating the preservation and dissemination of historical records, ensuring accessibility for researchers and the wider public. The Archaeological Archive at Lahore Fort, a collaborative initiative of Global and Historical Studies, Government College Lahore Pakistan, and the Walled City Lahore Authority, (we were part of the team) documents over a century of archaeological work from 1900 to 2000, spanning the British colonial period to post-independence Pakistan. This project digitally catalogues excavation reports, site maps, photographs, and documents, creating a structured and searchable database. By systematically cataloguing these materials, the archive facilitates comparative research on excavation methodologies, conservation practices, and shifting archaeological narratives over time (colonial to post-colonial). Integrating digital and AI tools enhances long-term preservation and allows for new interpretations of South Asian archaeology. The archaeological research community was an early adopter of digital tools (Richards 2022), yet ensuring the effective reuse of data requires structured frameworks. To address these challenges, FAIR principles and ontologies have been developed and are now being implemented, ensuring that digital archives become sustainable, accessible, and reusable resources for global heritage research.

Cloud Computing and Cultural Heritage IT: A Primer

Stephen Stead, Paveprime Ltd, UK

Cloud computing has become the common term used by many manufacturers to describe their products and services. Everything is now ‘Cloud’ or ‘Cloud ready’, but what exactly does this mean and what are the implications to cultural heritage computing? Many organisations are looking to Cloud computing to reduce their Information Technology costs. Is this a realistic goal? Certainly, the Sunday colour supplements are trumpeting this as the great benefit. This paper defines the key cloud computing concepts and examines the implications of cloud computing to the heritage sector. In particular, it outlines the organisational and policy changes that heritage organisations must consider.

We will cover the five Cloud tenets (Broad Network Access, Resource Pooling, Rapid Elasticity, Metered Service and On Demand Self Service), the 3 service models (Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS)), the four deployment models (Private Cloud, Public Cloud, Hybrid Cloud and Community Cloud), Governance, Risk and Compliance (GRC) and the generic Information Technology as a Service (ItaaS) concept. It will look at the generic management structures, policies and charge back and/or show back mechanisms that need to be implemented within organisations hoping to work with Cloud computing.

Dealing with Text: Machine Learning Techniques in Digital Humanities

Stephen Stead, Paveprime Ltd, UK

This paper will consider some Big Data and Machine Learning techniques concerned with representing Corpuses of text in Digital Humanities. In particular it will consider Topic discovery in grey literature using Latent Dirichlet Allocation (LDA). It will also show using simple techniques like Bag of Words, RegEx and Naive Bayes classifiers for representing texts and repurposing Sentiment Analysis to consider quantifying context descriptions.

Big Data, Machine Learning and AI: What’s it all About and What’s in it for Cultural Heritage?

Jonathan Whitson-Cloud, Horniman Museum and Gardens, London, UK
Stephen Stead, Paveprime Ltd, UK

The paper offers an overview of the characteristics and definitions of Big Data and its processing. It includes a case study on using machine learning within museum documentation and looks to future application areas.

Big Data is frequently characterised with the ‘V-Words’: Volume, Velocity and Variety. However, these are not the only V-words, and their use is not always clear. Their definitions will be made clear and the implications for data management practice both in general and in museums will be explored.

The power of Big Data is in what you do with it. Key classes of techniques will be outlined together with their challenges. One class of technique is the use of machine learning to process structured and unstructured data. The trial that used such text mining techniques on narrative material from the documentation of the Horniman Museum is a useful case study and test-bed.

We will then offer some thoughts on other application areas and lessons that can be learnt about data curation and integration at both the intra- and inter-museum levels.

T10/Workshop 01: CRMArchaeo: A Stepping Stone to Fair Practice

Stephen Stead, Open University; Paveprime Ltd, UK
Jane Jansen, Arkeologerna; Intrasis, Sweden

In this workshop we will explore how to use CRMarchaeo, part of the CIDOC Conceptual Reference Model, to link a wide range of existing archaeological documentation. In particular we will be considering the use of oral history and traditions and how they may be incorporated with excavation material.

When working with data deposited in archives in different eras and by different organisations using ever-evolving recording methodologies, a recurrent problem is being able to systematically access elements of the record without immersing oneself in the recording milieu of the original deposits. This high intellectual cost must be paid by each scholar wishing to work on the records of a particular archaeological investigation and so effectively creates a barrier to extensive reuse of archived data. The FAIR data principles require “that all research objects should be Findable, Accessible, Interoperable and Reusable (FAIR) both for machines and for people” (Wilkinson et al. 2016). One approach to making data FAIRly accessible while reducing the effort to a single “intellectual act” is to map to a “lingua franca”, such as CRMarchaeo.

The CRMarchaeo extension has been created to promote a shared understanding of how to formalise the knowledge extracted from the observations made by archaeologists. It provides a set of concepts and properties that allow clear explanation (and separation) of the observations and interpretations made, both in the field and in post-excavation. Attendees will work through a series of case studies that reflect different excavation documentation practices: from 1950s style day books through to context recording sheets. Followed by database/CAD combos and on to modern integrated object oriented database/GIS systems, like Intrasis.

The aim is to explore archetypical solutions and provide attendees with hands-on experience of mapping actual documentation practice to CRMarchaeo. This can then be applied to their own or archive documentation, both current and historical, in their own institutions or archives and lead to integrated reusable composites being available for both internal and external use.