The Generation Study: How is participant data stored and who can access it?
During the Generation Study, Genomics England securely stores participants’ samples and data, including the pregnant woman’s antenatal data, the baby’s sample and DNA sequence, and updates from their healthcare record.
Overview
When parents give consent to the Generation Study, they agree to:
- their baby’s blood sample being screened for over 200 rare genetic conditions, with results returned; and
- their baby’s sample and genomic and health data being stored long term and made available for wider research.
Genomics England, as the study sponsor, has a legal and ethical duty to take care of participants’ samples and data, as they have for other studies (including the 100,000 Genomes Project and the National Genomic Research Library (NGRL), which patients from the NHS Genomic Medicine Service can choose to join (see below for more information about the NGRL).
Participants can change their mind about taking part in the study and either unsubscribe or withdraw, which changes how their data is accessed and used.
What data is stored?
Genomics England keeps the following data:
- the baby’s DNA, which is stored as a digital file of their genome sequence;
- the pregnant woman’s antenatal record (this is maternity data that includes details about the pregnancy, labour and birth);
- regular updates from the baby’s healthcare record, including information such as test results and diagnoses from the NHS and other medical organisations; and
- contact details of the pregnant woman and their baby.
Where is the data stored?
The pregnant woman’s antenatal data and the baby’s genomic and healthcare data is stored in the NGRL, a secure central database from which approved researchers can access data from thousands of people in order to better understand diseases and develop new treatments. It includes a large number of NHS patients and family members from outside the Generation Study, who have had their genomes sequenced as part of their own healthcare plans. The NGRL itself is held in secure UK data centres.
Importantly, all identifiable participant data is removed from the NGRL so that researchers cannot see participants’ identities. This identifiable data is kept separate, outside of the NGRL, with additional security protections.
Who can access the data?
Genomics England manages the NGRL and approves access for researchers from around the world. Security measures ensure that only those approved can access it.
Approved researchers come from hospitals, universities, charities and companies across the healthcare sector (for example, pharmaceutics). Some of the ways in which they use the data are listed below.
- Supporting research into rare conditions to improve how they are diagnosed and managed.
- Distinguishing between harmful and harmless genetic variants. This is important for accurately diagnosing and developing treatments for genetic conditions.
- Aiding in cancer research. By looking at data with and without certain genetic variants, researchers can find new cancer-related genes and develop therapies that benefit many cancer patients.
- Improving the accuracy of genomic screening tests. This will help identify at-risk individuals sooner, meaning that treatments can begin earlier, which may lead to better outcomes for those with genetic conditions.
Select individuals at Genomics England can access identifiable participant data, including contact details. The purpose of this is:
- to enable the sharing of result information with NHS specialist teams and GPs; and
- to update participants about the study and future research opportunities.
How is participant data kept safe?
Genomics England follows the Five Safes framework as a best practice model for research data access and use.
- Safe data: Participant data is de-identified before researchers can access it, and replaced with a unique reference number.
- Safe settings: Genomics England operates a secure and controlled research environment to access the NGRL through a virtual desktop. Data is physically stored in secure data centres in the UK.
- Safe people: To be eligible to access the NGRL, all researchers and their institutions must be approved by Genomics England. Every researcher signs a code of good practice and completes data protection training.
- Safe projects: An independent Access Review Committee is responsible for making decisions about applications to access data. This committee includes clinical experts, scientists and participants whose data is in the NGRL. Research projects must adhere to a set of acceptable uses relating to the study of genomics and health.
- Safe outputs: Researchers can only work on data within the research environment, and can only remove summary data, such as the results of analyses they have undertaken. To do this, they need to submit a request to Genomics England’s Airlock team, which includes people with expertise in data security, clinical research and ethics. They make sure any requests align with the research project that was initially approved.
Genomic data is granular, and Generation Study participants may have specific characteristics such as a very rare genetic variant. This means that it is possible for a researcher to re-identify a participant in the dataset, for example by using their own knowledge and clinical experience about patients with rare conditions. There are strict penalties for anyone who tries to identify or misuse the data.
As mentioned above, only specific individuals at Genomics England can re-identify participants, and only for a small number of reasons.
What happens to study samples?
Any leftover samples are stored in a secure biobank in the UK. Each sample is identified with a unique code to protect participants’ identities.
These samples may be used again for approved healthcare research. This research would also need to be approved by the Access Review Committee before going ahead.
Key messages
- Genomics England securely store participants’ samples and data, including the pregnant woman’s antenatal data, the baby’s sample and DNA sequence, and updates from their healthcare record.
- Genomics England follows the Five Safes framework to keep participants’ data safe.
- Only select individuals at Genomics England can access identifiable participant data for the purposes of sharing result information and updating participants about the study.
- Data and samples will be used to enable researchers to learn more about genomics and health.
- Only approved researchers can access data, and they cannot see participants’ identities.
- Data is stored in the National Genomic Research Library (NGRL), a secure database managed by Genomics England. Any leftover samples are stored in a secure biobank in the UK.
Resources
For clinicians
- Genomics Education Programme: CPI: Generation Study: Recruit, enrol and sample
- Genomics Education Programme: CPI: Generation Study: Return results and further care
- Genomics Education Programme: CPI: Generation Study: Sample, sequence and interpret
For study teams at recruitment sites, there are numerous education and training resources available.
Please note that some of these resources are hosted on the Generation Study workspace on the NHS Futures platform. If you have not already had an invitation to join, please contact the Genomics England service desk: generationstudy@genomicsengland.co.uk.
Content may evolve over time. Should you have any issues accessing the content, please contact the service desk.
For patients
- Genomics England: How your data is used
- Genomics England: The Generation Study
- Genomics England: Generation Study Participant Information Sheet
- Genomics England: Generation Study translated participant information
- UK Data Service: What is the Five Safes framework?