The art of anonymisation: Preparing qualitative data for archiving
In this blog, the Following Young Fathers Further team of Linzi Ladlow, Anna Tarrant, Laura Way and Ben Handysides explore why and how to anonymise qualitative data for purposes of data preservation and archiving. They consider these insights in the context of rapid developments in datafication.
Source: Photo by Brittani Burns on Unsplash
Following Young Fathers Further (FYFF) is a qualitative longitudinal, comparative and participatory study, examining the parenting journeys and support needs of young fathers (aged 25 and under) and the practices of professionals who support them. It also uniquely builds on the Following Young Fathers baseline dataset (Neale et al. 2015), which is stored in the Timescapes Archive. Archiving the qualitative interview transcripts generated during the project is a major commitment of the study and a central tenet of the data management process.
Preserving participant identity
Moore (2012) observes that preserving participant identity via anonymity remains an ethical necessity in the context of sharing qualitative data. In a wider societal context of datafication, where many aspects of our daily lives are increasingly being rendered as data, the requirement to share qualitative data, which is always saturated with personal accounts and details, is now considered a universal ‘good’ and increasingly a requirement of funding councils. This tasks researchers who archive data with new ethical responsibilities to preserve the confidentiality of those who are interviewed, which requires careful attention to processes of anonymisation, and the need to preserve data integrity for the purposes of re-use.
The requirement to anonymise is often underscored by a protectionist orientation towards data originators that may unintentionally obscure consideration of the ethics of naming (Moore, 2012) i.e. how we represent both the participants interviewed and those discussed in qualitative interviews in ways that are accessible to future researchers without making them identifiable. Rethinking this politics of naming necessitates careful methods of data anonymisation to facilitate the sharing and preservation of qualitative data in a way that documents the socio-historic record while also balancing data integrity and participant protection.
A stakeholder approach to ethics
As a team, our overall ethical approach is guided by feminist ethics of care (e.g. Edwards and Mauthner 2012), in which we seek to balance participant ‘rights’ in the context of a broader community of stakeholders, including future researchers, whom we expect will (re)use the data at a later date. In our commitment to archiving the data we generate with our participants, we advocate and adopt an ‘ethical temporal sensibility’ (Hughes and Tarrant, 2020), underscored by a stakeholder approach towards data (Neale and Bishop, 2012).
Stakeholder ethics move the emphasis away from questions of data ownership, where the qualitative researcher owns the data they generate with participants, to one of data stewardship, acknowledging instead the long chains of relationships that individuals with different interests may have with the data over time. This includes curators, re-users, generators, or a combination of all of these, at different points in the lifetime of the data. This positionality requires us to be proactive:
- how we secure consent from participants for archiving the data
- how we navigate the various needs of the communities of scholars in which we are embedded, both now and in the future.
Securing informed consent
Informed consent for archiving was negotiated on an ongoing basis using accessible information (see FYFF participant information video). Participants were asked to consent to the shared copyright of the data, assigning shared rights to either make use of or withdraw the data from the archive, whilst allowing the FYFF team and future re-users to use the data for research purposes. Commitments to data archiving are relatively new, so we developed a specific consent and information form to gain consent for archiving.
As our interviews largely focus on families and relationships, participants invariably disclose sensitive information that might make them more identifiable, including discussion of issues such as child protection. the localities in which they reside, and difficult family dynamics. It is essential that participants fully understand what they are consenting to when agreeing to their data being archived and discuss with participants, including the level of restrictions which could be applied to an archived dataset such as an embargo on accessing data during the participants’ lifetime or access for a time-limited period. Restrictions can be put in place around those who have personal interests or relationships with the participants. Participants may also be cautious about being identified by their own family members, as was the case for one young dad who did not want to discuss a recent disagreement he had with his ex-partner:
“Cause this is being recorded I wouldn’t want to go into detail really, cause I think it’s something I could be identified as it being on me if she [ex-partner] were, not that she would, but if she were sort of to read it and stuff.”
In this case, we offered the participant the opportunity to read his transcript and omit any information that he did not want us to use or archive. This provided an opportunity to seek data confirmation (Patton, 2002) from the participant and to extend their participation in the research.
Preserving context for future scholars: use of codebooks
A key tension when archiving data is navigating the balance between protecting identities and maintaining the integrity of the data (Moore 2012; Neale 2021). Stronger anonymisation risks erasing the essence of the person, yet we are required to protect the identities of participants and the people and places discussed. This is a complex and interpretive process, requiring a combination of analytical and ethical decision-making; a process of analysis in and of itself. FYFF established a codebook with two functions; to link real identities to those we assign in the transcripts to preserve the intellectual coherence and integrity of the narrative and to document generalised descriptions that we use across the transcripts, increasing the usability for researchers. For example:
- Participants’ partners: names replaced with @@partner## or @@wife##.
- Place and localities: spatial relationships and sense of location anonymised to, ‘I live in @@Northern city 1## but I catch the train to @@Northern city 2## to work.
The codebook supports primary researchers with consulting the raw data in instances where sensitive details may be needed for contexts, such as being able to name people and places that participants have previously discussed. In depositing data into the archive, researchers can also request granulated access mechanisms that facilitate special access to ‘open data’ for data re-users with light touch anonymisation. This may require special permissions to be sought from data originators.
Whilst effective and ethical archiving and the preservation of qualitative data is possible, the anonymisation process can invariably obscure some nuance. The challenge is to provide future researchers with enough information and context while also sufficiently obscuring individual identifying features. Researchers have a responsibility to anonymise data carefully to uphold both participants’ rights and the needs of future researchers.
Edwards, R. Mauthner, M. 2012. Ethics and feminist research: theory and practice. In: Miller, T. (eds) Ethics in Qualitative Research. London: SAGE
Gabb, J. 2010. Home truths: ethical issues in family research. Qualitative Research. 10(4):461-478
Hughes, K. and Tarrant, A. 2020. ‘The Ethics of Qualitative Secondary Analysis’, in Qualitative Secondary Analysis, K. Hughes and A Tarrant (eds). SAGE Publications Ltd
Moore, N. 2012. The politics and ethics of naming: questioning anonymisation in (archival) research. International Journal of Social Research Methods. 15(4) 331-340
Neale, B. and Bishop, L. 2012. The Timescapes Archive: a stakeholder approach to archiving qualitative longitudinal data. Qualitative Research. 12(1), pp.53-65.