UK Biobank (United Kingdom)

Field Names
Records
Coordinating Country
United Kingdom
Region

United Kingdom

Brief Database Description

** NOTE: In contrast to our usual policy, this profile has not been reviewed by the database manager. We are continuing to seek further information on the fields with missing data. 

UK Biobank is a large, population-based prospective study that recruited 502,642 people aged between 40-69 years in 2006-2010 from across the country.  The participants have undergone measures, provided blood, urine and saliva samples for future analysis, detailed information about themselves and agreed to have their health followed.  The purpose of this biobank is to conduct detailed investigations of the genetic and non-genetic determinants of the diseases of middle and old age such as cancer, heart diseases, stroke, diabetes, arthritis, osteoporosis, eye disorders, depression and forms of dementia. Ideally, the UK Biobank will have up to 20 years of longitudinal follow up on the participants.  Follow-up is conducted chiefly through linkages to routinely available national datasets and includes ascertainment of deaths, prevalent and incident cancers, and hospital admissions among other outcomes.

The participants were assessed in 22 assessment centers throughout the UK, covering a variety of different settings to provide socioeconomic and ethnic heterogeneity and urban–rural mix. This ensured a broad distribution across all exposures to allow the reliable detection of generalizable associations between baseline characteristics and health outcomes.  The assessment visit comprised electronic signed consent; a self-completed touch-screen questionnaire; brief computer-assisted interview; physical and functional measures; and collection of blood, urine, and saliva. Multiple aliquots of different sample fractions are stored in UK Biobank’s automated laboratory, allowing for a wide range of future assays.

 
UK biobank has undertaken repeat measures on 20,000 participants and is currently undertaking an MRI body and DEXA bone scanning enhancement. 100,000 UK Biobank participants have worn a 24-hour activity monitor for a week. A web-based program is being rolled out that comprises detailed questionnaires on their diet, cognitive function and work history. UK Biobank has embarked on a major study to scan (image) 100,000 participants (brain, heart, abdomen, bones & carotid artery). Blood biochemistry is being analyzed (such as hormones & cholesterol). UK Biobank is linking to a wide range of electronic health records (cancer, death, hospital episodes, general practice) and is developing algorithms to accurately identify diseases and their sub-sets. All data, including genetic, biochemistry and imaging data, are being made available for research as they become ready.

Genome-wide genotyping data are available for all 500,000 participants in the UK Biobank cohort. Genotyping was performed using the Affymetrix UK BiLEVE Axiom array on an initial 50,000 participants; the remaining 450,000 participants were genotyped using the Affymetrix UK Biobank Axiom® array, that genotyped ~850,000 variants. Exome sequencing and whole genome sequencing projects are also underway.

The UK Biobank was established by the Wellcome Trust medical charity, Medical Research Council, Department of Health, Scottish Government and the Northwest Regional Development Agency. It has also had funding from the Welsh Assembly Government, British Heart Foundation and Diabetes UK. UK Biobank is hosted by the University of Manchester and supported by the National Health Service (NHS).

Database Type
Longitudinal Population Database
- Drug and Diagnosis Data
Large Clinical Trial Database
Tissue/Blood and Genomic/Pharmacogenetic Database
- Biobank
- - Population-based
- Genetic
- - GWAS

 
UK Biobank is a large, population-based prospective study that recruited 502,642 people from across the country in 2006-2010, aged 40-69 years. The participants have undergone measures; provided biological samples (i.e., blood, urine and saliva samples for future analysis; detailed information about themselves) and agreed to have their health followed.

With the consent of each participant, these data are being linked to their health-related records (in such a way that the participant’s confidentiality is preserved) so that the baseline information can be used in conjunction with the information about health conditions that develop.

Genome-wide genotyping data are available for all 500,000 participants in the UK Biobank cohort. Genotyping was performed using the Affymetrix UK BiLEVE Axiom array on an initial 50,000 participants; the remaining 450,000 participants were genotyped using the Affymetrix UK Biobank Axiom® array, that genotyped ~850,000 variants. Exome sequencing and whole genome sequencing projects are also underway.

Database Source
Case Report Forms

Survey Data
Other

[The assessment visit comprised:
- Electronic signed consent;
- Self-completed touch-screen questionnaire;
- Brief computer-assisted interview;
- Physical and functional measures; and
- Collection of blood, urine, and saliva (for biochemistry and genetic analyses).
Additional data come from wearable activity monitors, web-based questionnaires, and imaging data.]

Frequency of Data Collection
Other

(Repeat assessment data were collected on a subset of 20,000 participants between August 2012 and June 2013, and the repeat assessments are scheduled every 2-3 years during follow up)

Frequency of Data Update
Ongoing

[The UK Biobank resource was launched with the data collected at baseline made available in March 2012. All data, including genetic, biochemistry and imaging data, are being made available for research as they become ready. Repeat assessments are scheduled every 2-3 years during follow up.

The first batch of genetic data, which included genotyping and imputed data (on approximately 150,000 participants) was made publicly available in May 2015. This included the 50,000 participants genotyped using the UK BiLEVE array, and 100,000 participants genotyped on the UK Biobank array. Genetic data for the full cohort was released in July 2017.

A timeline of additional data made available to researchers is available at: https://biobank.ctsu.ox.ac.uk/crystal/exinfo.cgi?src=timelines

A provisional timeline for future data availability is available at: https://biobank.ctsu.ox.ac.uk/crystal/exinfo.cgi?src=future_timelines]

Years Covered
2006 - Present

(The recruitment period was from 2006 through 2010)

Population Type
General Population

(The participants were assessed in 22 assessment centers throughout UK, covering a variety of different settings to provide socioeconomic and ethnic heterogeneity and urban-rural mix. This ensured a broad distribution across all exposures to allow the reliable detection of generalizable associations between baseline characteristics and health outcomes.)

Patient Type
Outpatient/Non-Institutionalized
Date of Last Update
Ongoing

(UK Biobank is updated on an ongoing basis;
This profile was last updated for the B.R.I.D.G.E. TO DATA site on May 19, 2020.)

Field Names
Records
Database Population Size
0.5 - 1 Million

[502,642 participants; The rationale for recruitment of a cohort of such large size was to allow reliable quantification of the relevance of a large number of risk factors (e.g., lifestyle, environment and genes), both separately and in combination, to a wide range of diseases developing during follow-up.]

Active Population Size
<0.5 Million
Annual Change in Population
N/A

(Not available; as of April 2020, over 25,000 deaths have been linked to the study population.)

Sample Weights - Extrapolation Factors
No
Final Population Size
502,642

(A total of 502,642 participants were recruited; follow-up data collection and analyses are still ongoing.)

Field Names
Records
Age of Patients at Data Collection
Yes

Age at recruitment and age at first assessment and date of birth; other age data include: Participants aged 100+ years, and age in relation to certain medical events, e.g., age at start of contraceptive pill, age when heart attack diagnosed, age at first sexual intercourse, age when stopped smoking, etc.)

Approximate Percentage of Participants <18 years and those >65 years

<18 years = 0%
>65 years = 18.4%

(At the time of recruitment, 18.4%, i.e., 50,498 participants were aged 65 to 70 years old; median age 58 years.)

Gender Data
Yes
Percentage of Males/Females

Males = 45.6%
Females = 54.4%

(229,176 males and 273,466 females)

Ethnicity / Race Data
Yes

Race data include:
- White,
- Mixed,
- Asian/Asian British,
- Black/Black British, or
- Chinese.
Ethnicity data include:
- British,
- Irish,
- Caribbean,
- African,
- Indian,
- Pakistani,
- Bangladeshi,
- Chinese, or
- Other ethnic group.
There is also a genetic ethnic grouping field that uses genetic data to probabilistically score participants as Caucasian.

Geographic Location

Yes

Information includes:
- Country of birth
- Home location at assessment (and length of time at the listed address)
- Primary care trust where GP was registered
- Primary care trust responsible for participant data
- Delivery places that participant has had data recorded

Date of Birth Recorded
Yes

YYYY-MM-DD

Death Recorded
Yes

Death data are obtained from multiple sources, including linked databases such as death registries, cancer registries, hospital discharge information, and employment history.

Death registrars are primarily used to obtain the participant's age and date of death as well as the cause of death (ICD-10 and description).

Family history details include the mother and father's age at death and non-accidental deaths in close genetic family.

Availability of death certificate / autopsy information
No
Other Demographic Data
Yes

There are 110 data fields on socioeconomic and demographic characteristics captured in the UK Biobank. These include (but are not limited to) the following:
- Baseline characteristics (e.g., Sex, YOB);
- Primary demographics (e.g., Date of attending assessment center, Townsend deprivation index at recruitment);
- Geographic & Location (e.g., Home area population density - Urban/Rural; Home location at assessment);
- Education (e.g., Age completed full time education);
- Employment (e.g., Job code, Length of working week for main job);
- Employment history (e.g., Time employed at current job, job involves night shift);
- Work environment;
- Ethnicity;
- Household (e.g., Household income, Heating at home, Number in household, Own or rent);
- Early Life (e.g., Adopted as a child, Country of Birth, Handedness); and
- Indices of Multiple Deprivation (e.g., Community safety score, Crime score).

Field Names
Records
Physician ID
N/A

(Not applicable)

Physician Specialty
N/A

(Not applicable)

Pharmacy ID
N/A

(Not applicable)

Field Names
Records
Diagnosis Data
Yes

Diagnosis data are captured in the main and secondary diagnosis data fields as well as data fields for cancer, death, family history, medical history, psychosocial, cognitive, physical measures, and sex-specific factors.

The questionnaires also included specific questions related to:
- Breathing;
- Cancer screening;
- Chest pain;
- Claudication and peripheral artery disease;
- Cognitive function;
- Eyesight;
- Family history;
- General health;
- Hearing;
- Imaging measures;
- Linked health outcomes;
- Medical conditions (self-reported, e.g., cardiovascular disease, diabetes, allergies, lung disease, fractures, and other serious medical conditions or disabilities);
- Mental health;
- Mouth;
- Pain;
- Physical measures; and
- Sex-specific factors (e.g., male - number of children fathered, age of voice change; female - age of menopause, number of stillbirths, etc.).

Diagnoses Coded
ICD-10

ICD-9
[Diagnosis (main or secondary) and Type of cancer are coded with ICD-9 and ICD-10. Cause of death data are coded with ICD-10.]

Diagnoses: Date Parameters
2006 - Present
Diagnoses: Maximum Number of Codes Allowed
2

There are 2 data fields dedicated to coded diagnoses (main and secondary, both coded with ICD-9 and ICD-10). Other diagnosis data are captured in other data fields (e.g., Cause of death) or are self-reported.

Physical Examination Findings
Yes

Physical examination measures include:
- Acceleration averages,
- Arterial stiffness,
- Blood pressure,
- Body size measures,
- Bone-densitometry of heel,
- Hand grip strength,
- Impedance measures, and
- Spirometry.

Birth Defect Data
Yes

Birth defect data may be captured as:
- Diagnosis data (main or secondary),
- Primary cause of death,
- Cancer diagnosis data, or
- Operative procedures to correct birth defect.

Cancer Data
Yes

'- Type of cancer
- Self-reported cancer
- Histology
- Age and date at cancer diagnosis
- Tumor behavior
- Cancer diagnosed by doctor
- Cancer screening (colon screening, PSA test)
- Family history of cancer
- Cancer-related operations, etc.

Infectious Disease Data
Yes

Infectious disease data include:
- ICD-9 and ICD-10 diagnoses,
- Current eye infection,
- Infection as a cause of death,
- Contraindication for spirometry,
- Infection as a part of self-reported illnesses,
- Operative procedures related to infection,
- Specialty of clinical consultant,
- Reason for skipping ECG, and
- Reason for skipping physical examination measurements (e.g., grip strength)

Environmental Exposures
Yes

Environmental exposure data include:
- External causes of morbidity and mortality,
- Occupational exposures,
- Household information and home location,
- Sun exposure
- Secondhand smoke exposures (at home, outside home), and
- Frequency of loud music exposure.

Behavioral Data Elements
Yes

Behavioral data elements include information on:
- Smoking (~30 data fields, e.g., Pack years of smoking, smoking status, Ever tried to stop smoking, Number of unsuccessful stop-smoking attempts, Maternal smoking around birth, Smoking compared to 10 years previous)
- Alcohol (~42 data fields, e.g., Alcohol drinker status, Alcohol consumed, Intake frequency, Alcohol intake versus 10 years previously, Alcohol usually taken with meals, Reason for reducing amount of alcohol drunk, Average monthly intake of beer, champagne, red wine, etc., Former alcohol drinker)
- Diet (36+ data fields, e.g., Type of special diet followed, Variation in diet, diet sweets intake, Typical diet yesterday, Low calorie drink intake, Bread consumed, Coffee/Tea consumed, Ice cream intake, Type of milk consumed, Ingredients in homemade soup, Liquid used to make porridge)
- Physical activity (50+ data fields, e.g., type of physical activity in last 4 weeks, Duration of light/moderate/vigorous physical activity, Chest pain felt during physical activity, Doctor restricts physical activity due to heart condition, Maximum heart rate during fitness test, Leisure/social activities, Able to walk or cycle unaided for 10 minutes)
- Sexual factors (8 data fields, e.g., Age first had sexual intercourse, Lifetime number of sexual partners, Even had same-sex intercourse)
- Sleep (7 data fields, e.g., Sleep duration, Getting up in morning, Morning/evening person, Nap during day, Sleeplessness/insomnia, Snoring, Daytime dozing)
- Electronic device use (11 data fields, e.g., Frequency and length of mobile phone use, Regular use of hands-free device/speakerphone, Usual side of head for mobile phone use, Plays computer games, Internet user)
- Driving (~13 data fields, e.g., Time spend driving, Reason for glasses/contact lenses, Motor vehicle accident as a cause of death).

Field Names
Records
Procedure Data
Yes

Data on operations and laboratory testing are captured

Procedures Coded
OPCS
Number of Procedures Coded
2

There are 2 data fields dedicated to operative procedures (main and secondary coded using OPCS). Other procedural data are captured in other data fields (e.g., Cause of death, Treatment).

Procedure Date Parameters
2006 - Present
Laboratory Information
Yes

Physical measures included:
- Blood pressure;
- Arterial stiffness;
- Eye measures (visual acuity, refractometry, intraocular pressure, optical coherence tomography);
- Body composition measures (including impedance);
- Hand-grip strength;
- Ultrasound bone densitometry;
- Spirometry; and
- Exercise/fitness test with ECG.

Cognitive function testing was conducted during a initial and repeat visit, and included testing for:
- Prospective memory;
- Pairs matching;
- Fluid intelligence;
- Reaction time;
- Numeric memory;
- Lights pattern memory;
- Words;
- Trail making; and
- Symbol digit substitution.

Imaging data included:
- Abdominal MRI;
- Brain MRI;
- Heart MRI; and
- Bone size/mineral/density and body composition.

Samples of blood, urine and saliva were also collected. The list of biochemical and genetic markers are listed in the Biomarkers field of this profile.

Field Names
Records
Drug Data
Yes

Drug data include:
- Treatments/medications (common prescription and OTC)
- Medication for the treatment of specific conditions (e.g., cholesterol, BP, diabetes, smoking cessation, constipation, heartburn, allergies, etc.),
- Number of medications,
- Taking other prescription medications, and
- Recent medications for specific conditions (asthma, cystic fibrosis, hay fever, etc.).
Also, information are captured on:
- Illicit drug use,
- Supplements (dietary, vitamins, energy, mineral, protein),
- Herbal teas, and
- Flu vaccines.

Drug Date Parameters
2006 - Present
Drug Regimen & Route
No

However, sometimes information on the formulation are provided as par of the drug name, e.g., gelusil tablet, arobon 80% powder, isopto epinal 1% eye drops, benzagel 5% gel

Drug Manufacturer
No
Drug Dosage
No
Drug Days Supply
No
Drug Coding System: Maximum Number
48

(Median value is 2 drugs)

Drug Coding System: Primary
Other

READ

Drug Coding System: Other
N/A

(Not applicable)

Drug Generic Name
Yes

For example, "Medication for pain relief, constipation, heartburn" include touchscreen selections such as "Ibuprofen (e.g., Nurofen)" and "Ranitidine (e.g., Zantac)." Selections also include "None of the above", "Do not know", and "Prefer not to answer".

Drug Additional Information
No
Field Names
Records
Biobank Type
Population-based

 
UK Biobank is a large, population-based prospective study that recruited 502,642 people aged between 40-69 years during the period 2006-2010 from across the country. The participants were assessed in 22 assessment centers throughout the UK, covering a variety of different settings to provide socioeconomic and ethnic heterogeneity and urban–rural mix. This ensured a broad distribution across all exposures to allow the reliable detection of generalizable associations between baseline characteristics and health outcomes.

Ideally, the UK Biobank will have up to 20 years of longitudinal follow up on the participants. Follow-up is conducted chiefly through linkages to routinely available national datasets and includes acertainment of deaths, prevalent and incident cancers, and hospital admissions among other outcomes.

The assessment visit comprised:
- electronic signed consent;
- a self-completed touch-screen questionnaire;
- brief computer-assisted interview;
- physical and functional measures; and
- collection of blood, urine, and saliva.

Human Specimen
Blood: Buffy cells, plasma, RBC, serum
Saliva
Urine

 
Participants have undergone measures, provided blood, urine and saliva samples for future analysis, detailed information about themselves and agreed to have their health followed. The purpose of this biobank is to conduct detailed investigations of the genetic and non-genetic determinants of the diseases of middle and old age such as cancer, heart diseases, stroke, diabetes, arthritis, osteoporosis, eye disorders, depression and forms of dementia.

Blood Type
Yes
Biomarkers
Yes

The biomarkers selected for assay have been chosen because they are established risk factors for disease (e.g., lipids for vascular disease, sex hormones for cancer), diagnostic measures (e.g., HbA1c for diabetes and rheumatoid factor for arthritis) or characterize phenotypes not otherwise well assessed (e.g., biomarkers for renal and liver function). Biomarkers from Serum/Red blood cell/Urine include the following:
Cardiovascular:
- Cholesterol
- Direct Low Density Lipoprotein
- HDL-Cholesterol
- Triglyceride
- Apolipoprotein A
- Apolipoprotein B
- C-reactive Protein
- Lipoprotein (a)
Bone and joint:
- Vitamin D
- Rheumatoid factor
- Alkaline Phosphatase
- Calcium
Cancer:
- SHBG
- Testosterone
- Oestradiol
- IGF-1
Diabetes:
- HbA1c
- Glucose
Renal:
- Cystatin C
- Creatinine
- Total protein Urea
- Phosphate
- Urate
- Creatinine (enzymatic)
- Sodium
- Microalbumin
- Potassium
Liver:
- Albumin
- Direct Bilirubin
- Total Bilirubin
- Gamma Glutamyltransferase
- Alanine aminotransferase
- Aspartate aminotransferase

Several physical biomarkers are captured including in all patients: Hair/balding pattern, blood pressure, heart rate, bone density, body measurements (e.g., height, weight, BMI, hip-to-waist ratios).
Immunochemical biomarkers can be evaluated from specimen on a study basis, e.g., beta actin, CRP, cortisol, nitrite, etc.
Additionally, genetic and tumor markers are captured in the genotyping data and include:
- Alzheimer’s disease
- ApoE
- Autoimmune/inflammatory
- Blood phenotypes
- Cancer common variants
- Cardiometabolic
- eQTL
- Fingerprint
- HLA
- KIR
- Lung function phenotypes
- Common mitochondrial DNA variants
- Neurological disorders
- NHGRI GWAS catalog
- Pharmacogenetics/ADME
- Tags for Neanderthal ancestry
- Y chromosome markers
- Rare variants in cancer predisposition genes
- Rare variants in cardiac disease predisposition genes
- Rare, possibly disease causing, mutations
- CNV regions for developmental delay, neuropsychiatric disorders, and lung function
- Rare coding variants
- Protein truncating variants
- Other rare coding variants
- Genome-wide coverage
- Genome-wide coverage for common variants
- Genome-wide coverage for low-frequency variants
- Total number of markers on array

Patient ID
Barcode

 
Each vial of a sample will contain a unique bar-code that will be scanned into the assessment centre IT system. The barcode links each vacutainer with the unique participant identifier number. It is important that the linkage of the samples to participant data via barcode occurs from the start of sample collection at each of the laboratory data structure into the central Laboratory Information Management System (LIMS).

Number of Samples
Yes

The participants have undergone measures, provided blood, urine and saliva samples for future analysis, detailed information about themselves and agreed to have their health followed. Multiple aliquots of different sample fractions are stored in UK Biobank’s automated laboratory, allowing for a wide range of future assays.

Frequency of Sample Collection
The samples were collected in April 2007, and additional blood and saliva samples were collected in August of 2009
Pre-diagnostic Sample Collection
Yes

Samples were obtained regardless of diagnoses. Health conditions were self-reported.

Post-treatment Sample Collection
Yes

Samples were obtained regardless of diagnoses. Health conditions were self-reported.

Method of Sample Collection
Blood draw, urine in cup and saliva collection
Age at Sample Collection
Yes

Age (reported or imputed) and DOB

Date of Sample Collection
Yes

YYYY-MM-DD

Reason for Sample Collection
Clinical trial
Public health survey
Other

(The purpose of this biobank is to conduct detailed investigations of the genetic and non-genetic determinants of the diseases of middle and old age such as cancer, heart diseases, stroke, diabetes, arthritis, osteoporosis, eye disorders, depression and forms of dementia.)

Method of Sample Storage
All vacutainers are to be maintained at 4°C (with the exception of the acid citrate dextrose tube which is to be maintained at 18°C) until ready for packing and dispatch to the coordinating centre laboratory in temperature-controlled shipping boxes.

The boxes will be collected by a commercial courier and transported overnight to the central laboratory where they will be processed and transferred to ultra-low temperature archives.

Liquid nitrogen serves as a back-up archive (-196°C). Ultra-low temperature archives involve the following:
- Plasma;
- Buffy coat;
- Red cells in EDTA (9ml) x2 vacutainers stored in -80°C freezer and liquid nitrogen tanks;
- Plasma in EDTA (PST) vacutainers stored in -80°C freezer and liquid nitrogen tanks;
- Serum in Clot activator (SST) vacutainers stored in -80°C freezer and liquid nitrogen tanks;
- DMSO blood in ACD vacutainers stored in liquid nitrogen tanks;
- Hematology in EDTA (4ml) vacutainers are for immediate use; and
- Urine in urine vacutainers stored in -80°C freezer and liquid nitrogen tanks.

Length of Sample Storage
At least 20 years (or until depleted)
Pathology
Unknown
DNA Isolation
Yes
RNA Isolation
Yes
Cell Culture
No
Genetic Testing
Yes

Genome-wide genotyping data are available for all 500,000 participants in the UK Biobank cohort. Genotyping was performed using the Affymetrix UK BiLEVE Axiom array on an initial 50,000 participants; the remaining 450,000 participants were genotyped using the Affymetrix UK Biobank Axiom® array, that genotyped ~850,000 variants. The two arrays are extremely similar (with over 95% common content).

The first batch of genetic data, which included genotyping and imputed data (on approximately 150,000 participants) was made publicly available in May 2015. This included the 50,000 participants genotyped using the UK BiLEVE array and 100,000 participants genotyped on the UK Biobank array. Genetic data for the full cohort was released in July 2017.

The list of genetic fields include:
- Genetic ethnic grouping
- Genetic principal components
- Genetic relatedness pairing
- Genetic relatedness exclusions
- Genetic relatedness factor
- Genetic relatedness IBS0
- Genetic sex
- Heterozygosity
- Heterozygosity, PCA corrected
- Missingness
- Recommended genomic analysis exclusions
- UKBiLEVE unrelatedness indicator, and
- Multi-allelic genetic markers (Affymetrix array): Affymetrix SNP ID, Chromosome, Position within chromosome, Allele A, Allele B.

Access for Research: Specimens
Yes, for clinical and/or epidemiologic research

(Requires permission via data access application)

Access for Research: Genetic Data
Yes

However, this requires permission via data access application

Access for Research: Epidemiologic Data
Yes

However, this requires permission via data access application

Quality Assurance Procedures
Yes

About 20,000 aliquots were produced in 1.4ml bar-coded tubes each day. This high throughput repetitive work, coupled with the requirement for high quality and secure tracking of samples, has led to the development of highly automated platforms for UK Biobank that are fully integrated with the Laboratory Information Management System (LIMS) software. Some of the liquid handling tasks (e.g., urine) is managed using customized integrated robotic workstations available from commercial suppliers. The more complex fractionation and liquid handling tasks are performed on custom-built multi- function automated platforms. Importantly, these platforms do not rely on any “leading edge” technology to function; rather they represent a new configuration of existing robust technologies (which reduces the risk of failure). Only those assays that cannot be done subsequently on samples that have been frozen (i.e., hematology) were performed as the samples arrived at the central laboratory in order to streamline processing, improve cost- effectiveness and minimize quality control issues.

Family History
Yes

Family history data include:
- Age of parents,
- Living/deceased status of parents,
- Number of siblings,
- Non-accidental death in close genetic family, and
- llnesses in parents.
Family history is obtained for adoptive parents if subject is an adopted child.

Medical History
Yes

History of diseases and medication use are captured, including:
- History of medical conditions,
- Cancer,
- Psychiatric/mental health,
- Memory,
- Sexual health,
- MRIs,
- Operations, and
- Early life factors.

Biobank Linkage
Clinical database
Vitals registry
Civil registry
Other

[Linkages to health outcomes data sources include cancer and death registers, hospital discharge diagnosis data, general practitioner data, and other medical (e.g., prescriptions, pathology reports, imaging reports, screenings) and health-related data (e.g., employment, benefits, socio-economic records).]

Field Names
Records
Type of Genetic Database
GWAS database

Genome-wide genotyping data are available for all 500,000 participants in the UK Biobank cohort. Genotyping was performed using the Affymetrix UK BiLEVE Axiom array on an initial 50,000 participants; the remaining 450,000 participants were genotyped using the Affymetrix UK Biobank Axiom® array that genotyped ~850,000 variants. The two arrays are extremely similar (with >95% common content).

The first batch of genetic data that included genotyping and imputed data (on ~150,000 participants) was made publicly available in May 2015. This included the 50,000 participants genotyped using the UK BiLEVE array and 100,000 participants genotyped on the UK Biobank array. Genetic data for the full cohort were released in July 2017.

Quality control and imputation [to >90 million single nucleotide polymorphisms (SNPs), indels and large structural variants] was performed by a collaborative group headed by the Wellcome Trust Centre for Human Genetics.

The following data are available:
- A clean set of quality-controlled (QC) genotype calls;
- Confidence values that a genotype call is correct;
- Intensity data to generate cluster plots;
- Extensive QC information regarding SNPs and samples including SNP metrics, batch effects, population structure and relatedness; and
- Imputed data.

The following data are also available upon request:
- Un-QC’ed genotype calls and confidences;
- CEL (image) files; and
- Spectrophotometric measurements taken during DNA isolation.

Other genetic sequencing projects are underway.
Exome sequencing: In 2017 GSK and Regeneron made an application for an exome sequence assay on 50,000 UK Biobank participants. Regeneron has entered into a further collaboration with AbbVie, Alnylam Pharmaceuticals, AstraZeneca, Biogen, Pfizer, Takeda and Bristol-Myers Squibb to undertake an exome sequence assay on the remaining 450,000 participants. They aim to complete this work over 3 years, with the data being made available through the Data Showcase by the end of 2020.

Whole genome sequencing: In spring 2018 the Medical Research Council (MRC) awarded UK Biobank with a £30M grant to sequence the whole genome of 50,000 UK Biobank participants. This sequencing has been undertaken by the Wellcome Sanger Institute, Cambridge starting 2019 and will continue through 2020.

Source of Genetic Data
Clinical trial data
Specimen Genotyped
Yes

DNA was extracted from buffy coat samples

Tissue Form
Peripheral whole blood

During the initial recruitment of participants (2007-2010), samples that could yield DNA were collected. DNA extraction was deferred until such a time that a project required DNA, and it was anticipated that the cost of extraction would be reduced. In early 2013, DNA extraction began on buffy coat samples and genotyping was performed on a custom Affymetrix Axiom array.

Genetic Template
DNA

RNA
(Genomic DNA and RNA from blood)

Gene-Drug Response
Yes

Pharmacogenetics as well as absorption, distribution, metabolism, and excretion (ADME) content consists of markers for genetic variants of related genes listed in the Pharmacogenomics Knowledgebase (PGKB) - such as those with known relevance to drug metabolism - and some markers from Affymetrix’s DMETTM Plus platform. The set of ADME markers from Axiom Biobank Genotyping Array was selected for UK Biobank Axiom Array.

Gene-Disease Relationship
Yes

UK Biobank Axiom® Array is designed using imputation-aware SNP selection. This array provides optimized content modules for genome-wide association studies (GWAS) of common and low-frequency variants, biological function, and human disease in populations of European and British ancestry. The comprehensive coverage also includes rare coding variants, pharmacogenomics markers, copy number regions, HLA, inflammation, and eQTL variants.

The UK Biobank’s prospective genotyping study of 500,000 individuals is aimed at uncovering the complex interactions between genes, lifestyle, and environment (marker/phenotype interactions). The resulting data will be available to all bona fide researchers, providing a large reference data set (500,000 samples) that can be useful in assessing potential case associations of rare variants identified in studies of other populations.

Number of genetic markers of disease included on the UK Biobank Axiom Array:
Total number of markers on array (820,967)
Genome-wide coverage
- Genome-wide coverage for common variants (348,569)
- Genome-wide coverage for low-frequency variants (280,838)
Rare coding variants
- Protein truncating variants (30,581)
- Other rare coding variants (80,581)
Markers of special interest
- Alzheimer’s disease (803)
- ApoE (1,147)
- Autoimmune/inflammatory (258)
- Blood phenotypes (2,545)
- Cancer common variants (343)
- Cardiometabolic (377)
- eQTL (17,115)
- Fingerprint (262)
- HLA (7,348)
- KIR (1,546)
- Lung function phenotypes (8,645)
- Common mitochondrial DNA variants (180)
- Neurological disorders (19,791)
- NHGRI GWAS catalog (8,136)
- Pharmacogenetics/ADME (2,037)
- Tags for Neanderthal ancestry (11,507)
- Y chromosome markers (807)
- Rare variants in cancer predisposition genes (6,543)
- Rare variants in cardiac disease predisposition genes (1,710)
- Rare, possibly disease causing, mutations (13,729)
- CNV regions for developmental delay, neuropsychiatric disorders, and lung function (2,369)

Gene-Health Outcome Relationship
Yes

(Limited)
Although some phenotypes are readily available, others (particularly some health outcomes) may not be well-ascertained or may not be appropriately validated (at this time). By way of illustration, self-reported outcomes collected during the participant baseline visit are readily available. However, other phenotypes such as validated outcomes for incident and prevalent disease depend on the availability of the health record linkage data (over which UK Biobank inevitably has less direct control).

Gene-Environment Response
Yes

The UK Biobank’s prospective genotyping study of 500,000 individuals is aimed at uncovering the complex interactions between genes, lifestyle, and environment (marker/phenotype interactions). The resulting data will be available to all bona fide researchers, providing a large reference data set (500,000 samples) that can be useful in assessing potential case associations of rare variants identified in studies of other populations.

Method of Imputing Genetic Data
Yes

Genotypes were imputed into the dataset using computationally efficient methods combined with the Haplotype Reference Consortium (HRC) and UK10K haplotype resource. This increased the number of testable variants >100-fold to ~96 million variants, stored in the compressed and indexed BGENv1.2 format. The imputed genotypes are aligned to the + strand of the reference, and the positions are in GRCh37 coordinates. Additional details are available online at: http://www.ukbiobank.ac.uk/wp-content/uploads/2014/04/imputation_documentation_May2015.pdf

Genetic Variant Identification
dbSNP accession / reference SNP cluster (rsID)

Genomic locus / coordinates
Other

(Affymetrix ID is also recorded.
Genomic locus includes Chromosome number, Genomic position start and end.)

Genetic Data Level
Individual
Aggregate - by study populations

(Individual genetic data for GWAS analysis can be obtained by all bona fide researchers with appropriate approvals)

Genotyping Method
Yes

Genome-wide genotyping data are available for all 500,000 participants in the UK Biobank cohort. Genotyping was performed using the Affymetrix UK BiLEVE Axiom array on an initial 50,000 participants; the remaining 450,000 participants were genotyped using the Affymetrix UK Biobank Axiom® array that genotyped ~850,000 variants. The two arrays are extremely similar (with >95% common content).

Protocols for each of the genotyping arrays are publicly available. Each record indicates which genotypic array was used.

Method of Genetic Variant Filtering
Yes

UK Biobank undertook QC in several stages. First they used several SNP-based metrics to flag SNPs with less reliable genotyping results, to be set to missing in the batches where they failed their filters. Then they identified poor quality samples using only high quality SNPs (defined as SNPs that passed QC filters in all 33 batches in this interim release). They also performed other sample-based inference such as principal component analysis and relatedness inference. Properties of UK Biobank (such as its large cohort size) mean that some quality control metrics commonly used in genome-wide association studies (GWAS) are not sufficient in this context. They used a variety of approaches in their QC procedures to account for the effects of population structure and batch-based genotyping, described in the following document: http://www.ukbiobank.ac.uk/wp-content/uploads/2014/04/UKBiobank_genotyping_QC_documentation-web.pdf

In addition, several filters were applied during the imputation process. In the pre-phase step of imputation, the SNP QC filters were applied along with the removal of sample outliers, multi-allelic SNPs and minor allele frequency (MAF <1%). Whole-genome imputation involved filtering on MAF of 0.001%. Additional details can be obtained at: http://www.ukbiobank.ac.uk/wp-content/uploads/2014/04/imputation_documentation_May2015.pdf

Haplotypes
Yes
Haplogroups
No

However, researchers have independently conducted haplogroup analyses using the UK Biobank data

Variable Number of Tandem Repeats (VNTR)
Yes
Single Nucleotide Polymorphisms (SNPs)
Yes

The lists of SNPs in the imputed datasets can be downloaded from the Field's Resources tabs on a per-chromosome basis or as a combined tars in Resource 1965 and Resource 1671.

Variant Type
SNPs
Multiple base pair changes (insertions or deletions)
Other

(Large structural variants)

Variant Class
No
Mutation Indicated
Yes

Data on mutation are included as a description under the field Type, e.g., True SNP. Exactly one nucleotide on the flanking sequence is replaced with exactly one nucleotide on the subject sequence. Additionally, the mutation (and reference) length and sequence are provided.

Position
Yes

Genomic locus information is recorded

Amino Acid Change
No
Genotype / Polymorphism
Yes

The exact nucleic acid substitution and location is indicated

Allele Frequency
Yes

The information scores and minor allele frequency data for the imputed genotypes (computed with QCTOOL) can also be downloaded in Resource 1967.

Linkage Disequilibrium (r²)
Yes

For filtering purposes, pairwise r2≤0.1 to exclude SNPs in high linkage disequilibrium. (The r2 coefficient was computed using plink and its ‘indep-pairwise’ function with a moving window of size 1000 bp.)

Many studies using UK Biobank data have calculated and reported linkage disequilibrium scores. Some linkage disequilibrium data form UK Biobank are available in secondary online repositories, such as LD Hub (http://ldsc.broadinstitute.org).

Noncarriers Indicated
Yes

Genetic data indicate homozygous (AA, BB) and heterozygous (AB) variants

Association Statistics
No
Genetic Relatedness Pairing
Yes

Extensive QC information regarding SNPs and samples including SNP metrics, batch effects, population structure and relatedness (including familial relationships).

Data Sharing: Genetic Data
Yes

UK Biobank intends to make available in due course a set of all (or at least the great majority) GWAS results available through the European Genome Archive.

UK Biobank's Showcase is the publicly available data stored online (https://www.ukbiobank.ac.uk/data-showcase/); Showcase aims to present the data available for health-related research in a comprehensive and concise way, and to provide technical information for researchers considering applying to use the resource.

Access for Research
Yes, access to genetic data for laboratory and/or epidemiologic research is available

However, access to the biological samples that are limited and depletable will be carefully controlled and coordinated. The quantity of sample that is required will be judged against the potential benefits of the research project, with advice from appropriate experts as required.

Genetic Data Linkage
Yes

Linkages to health outcomes data sources include cancer and death registers, hospital discharge diagnosis data, general practitioner data, and other medical (e.g., prescriptions, pathology reports, imaging reports, screenings) and health-related data (e.g., employment, benefits, socio-economic records).

Description of Genetic Data Linkage
No charge to register or to view aggregate data in the data fields in the showcase. To access data, applications need to be submitted and data requests will result in a fee.

The Resource is available to all bona fide researchers for all types of health-related research that is in the public interest, without preferential or exclusive access for any person. All researchers, whether in universities, charities, government agencies or commercial companies, and whether based in the UK or abroad, will be subject to the same application process and approval criteria.

Applications to use the Resource will be checked to ensure that research proposals are consistent with these Access Procedures, the Ethics & Governance Framework, and the consent that was provided by the participants (including having relevant scientific and ethics approval). Data licenses are valid for 1 year.

Access to the biological samples that are limited and depletable will be carefully controlled and coordinated. The quantity of sample that is required will be judged against the potential benefits of the research project, with advice from appropriate experts as required.

Access to the Resource is on a cost-recovery basis, and charges are the same, regardless of the type of Institution (i.e., whether from academia or commercial organizations). Costs are based on whether the dataset requires data, bulk items (i.e., data that require in-situ access), or samples. Following a review of their charging procedure, there is now an initial fee at preliminary application submission, followed by a flat fee for data extraction (for non-bulk data).The UK Biobank charging policy is as follows:
- £250 + VAT (where applicable) payable upon submission of a preliminary application.
- £1,500 + VAT (where applicable) per application that requires access to data only.
An additional cost of £500 + VAT (where applicable) for access to any bulk data files (includes MRI/ DXA/ carotid ultrasound data available from October 2015, OCT and fundus images, ECG raw data, HES data, genetic data, built environment data and accelerometer data). Please note that the genetic data include the genotyping data and the imputed data. These costs are subject to change; as and when more imaging data are acquired costs may be increased. The page will be updated once the costs have been finalized
- £bespoke quote for applications that request access to biological samples.
- £bespoke quote for re-contact requests.
- £bespoke quote for particularly time-consuming customization of data sets.
Field Names
Records
Cost Data
No
Cost Denomination
N/A

(Not applicable)

Type of Cost Data
N/A

(Not applicable)

Description of Surrogate Link
N/A

(Not applicable)

Field Names
Records
Data Validation Against Original Source
Yes

Validation for questionnaires: Pre-coded lists of diseases, drugs, and occupations are built into the CAPI system, along with structured search facilities, to help this information to be recorded (and automatically coded) both rapidly and completely. Other innovations to improve data quality and efficiency of collection include the use of inbuilt cross-checks between relevant questionnaire responses, and check messages when extreme values are entered or when no value is provided.

Validation for data linkage: On receipt of the data file from the external data provider, the contents are inspected to understand exactly what information is contained. In particular, the format and values of individual data fields are scrutinized to understand whether they conform to those indicated in the data dictionary, and whether the information is useful for research purposes. Any coding ambiguities identified at this stage are clarified by UK Biobank’s data analysts or with the data provider, if necessary.

Check for the following are performed: Mismatches, data formatting, Definitive list of coded values.

Values which fail validation are flagged for attention and investigated further until a decision is made about whether to exclude the record from import (i.e. the record does not belong to a UK Biobank participant), to modify the list of definitive values (i.e. the data dictionary) to incorporate a new valid code, or to modify the value into a valid code.

Access to Medical Records
No
Linkage to Other Databases
Yes

Identifiable data (such as name, date of birth, NHS number) is collected for each participant at recruitment and is also contained in the data file supplied to UK Biobank. The mismatch rate is estimated to be <0.1%, largely due to a very high proportion of the cohort having a NHS number (or CHI number in Scotland), which acts as a unique identifier for linkage purposes.

Brief Description of Linkage Capabilities

Linkages to Hospital inpatient databases, GP databases, Death registers, and Cancer registers are performed.

Death data: 
- England & Wales: Health & Social Care Information Centre (HSCIC) [2006 onwards]
- Scotland: Information Services Department (ISD) [2006 onwards]

Cancer data:
- England & Wales: Information Centre (NHS IC) [1979 onwards] 
- Scotland: National Records of Scotland, NHS Central Register [1957 onwards]

Hospital Admissions (Inpatient) data:
- England: Hospital Episode Statistics (HES) [1996 onwards]
- Scotland: Scottish Morbidity Record (SMR) [1981 onwards]
- Wales: Patient Episode Database for Wales (PEDW) [1999 onwards]

Field Names
Records
Database Contact Data

Professor Sir Rory Collins
Principal Investigator at UK Biobank
Head of Nuffield Department of Population Health and BHF Professor of Medicine and Epidemiology
UNITED KINGDOM
Phone: +44 (0)1865 743743 
Fax: +44 (0)1865 743985
Email: rory.collins@ndph.ox.ac.uk

Alternate Contact

Dr. Naomi Allen
Senior Epidemiologist at UK Biobank
Associate Professor in Epidemiology, UK Biobank, Nuffield Department of Population Health
UNITED KINGDOM
Phone: +44 (0)1865 743805
E-mail: naomi.allen@ndph.ox.ac.uk

Source of Database Funding
Government
Private

 
(The UK Biobank was established by the Wellcome Trust medical charity, Medical Research Council, Department of Health, Scottish Government and the Northwest Regional Development Agency. It has also had funding from the Welsh Assembly Government, British Heart Foundation and Diabetes UK. UK Biobank is hosted by the University of Manchester and supported by the National Health Service (NHS). UK Biobank is open to bona fide researchers anywhere in the world, including those funded by academia and industry.)

Sponsoring Government Agency
Department of Health
Scottish Government
Welsh Assembly Government
National Health Service (NHS)
Sponsoring Pharmaceutical Manufacturer

N/A

(Not applicable)

Database Usage Restrictions
Public Access

The Resource is available to all bona fide researchers for all types of health-related research that is in the public interest, without preferential or exclusive access for any person. All researchers, whether in universities, charities, government agencies or commercial companies, and whether based in the UK or abroad, will be subject to the same application process and approval criteria.

Applications to use the Resource will be checked to ensure that research proposals are consistent with these Access Procedures, the Ethics & Governance Framework, and the consent that was provided by the participants (including having relevant scientific and ethics approval). Data licenses are valid for 1 year.

Access to the biological samples that are limited and depletable will be carefully controlled and coordinated. The quantity of sample that is required will be judged against the potential benefits of the research project, with advice from appropriate experts as required.

Charge for Database Usage
Yes

No charge to register or to view data in the data fields in the showcase. To access data, applications need to be submitted and data requests will result in a fee.

Access to the Resource is on a cost-recovery basis, and charges are the same, regardless of the type of Institution (i.e. whether from academia or commercial organizations). Costs are based on whether the dataset requires data, bulk items (i.e. data that requires in-situ access), or samples. Following a review of our charging procedure, there is now an initial fee at preliminary application submission, followed by a flat fee for data extraction (for non-bulk data).The UK Biobank charging policy is as follows:
- £250 + VAT (where applicable) payable upon submission of a preliminary application.
- £1,500 + VAT (where applicable) per application that requires access to data only.
An additional cost of £500 + VAT (where applicable) for access to any bulk data files (includes MRI/ DXA/ carotid ultrasound data available from October 2015, OCT and fundus images, ECG raw data, HES data, genetic data, built environment data and accelerometer data). Please note that the genetic data includes the genotyping data and the imputed data. These costs are subject to change; as and when more imaging data are acquired costs may be increased. We will update this page once these costs have been finalized
- £bespoke quote for applications that request access to biological samples.
- £bespoke quote for re-contact requests.
- £bespoke quote for particularly time-consuming customization of data sets.

The two relevant circumstances which may qualify for the reduced-fee regime are:
1. Applications from bona fide students for the purpose of producing their thesis
2. Applications from applicants who are resident in developing countries

The reduced fee is £500 in aggregate (plus VAT) – as compared to the normal fee of £2,000 in aggregate (plus VAT) – payable as to £250 on submission of the preliminary application and £250 on approval of the main application.

Data Media Format
Other

[Online

Some data items that are particularly large and/or complex are available to download as separate data files (e.g. imaging files (MRI, OCT scans), ECG data, accelerometer data, and hospital in-patient data). For some of these files (e.g., OCT data), in-situ access is also available.]

Number of Publications Using Database
>1100

UK Biobank has a searchable publications listing available at: https://www.ukbiobank.ac.uk/published-papers/

References of Studies Using/Describing Database

1. Dekkers IA, Jansen PR, Lamb HJ. Obesity, Brain Volume, and White Matter Microstructure at MRI: A Cross-sectional UK Biobank Study. Radiology. 2019 Apr 23:181012.

2. Hwang LD, Lin C, Gharahkhani P, Cuellar-Partida G, Ong JS, An Gordon SD, Zhu G, MacGregor S, Lawlor DA, Breslin PAS, Wright MJ, Martin NG, Reed DR. New insight into human sweet taste: a genome-wide association study of the perception and intake of sweet substances. Am J Clin Nutr. 2019 Apr 21. pii: nqz043.

3. Machado-Fragua MD, Struijk EA, Ballesteros JM, Ortolá R, Rodriguez Artalejo F, Lopez-Garcia E. Habitual coffee consumption and risk of falls in 2 European cohorts of older adults. Am J Clin Nutr. 2019 Apr 21. pii: nqy369.

4. Hajna S, White T, Panter J, Brage S, Wijndaele K, Woodcock J, Ogilvie D, Imamura F, Griffin SJ. Driving status, travel modes and accelerometer assessed physical activity in younger, middle-aged and older adults: a prospective study of 90 810 UK Biobank participants. Int J Epidemiol. 2019 Apr 19. pii: dyz065.

5. Lotta LA, Mokrosiński J, Mendes de Oliveira E, Li C, Sharp SJ, Luan J, Brouwers B, Ayinampudi V, Bowker N, Kerrison N, Kaimakis V, Hoult D, Stewart ID,  Wheeler E, Day FR, Perry JRB, Langenberg C, Wareham NJ, Farooqi IS. Human Gain-of-Function MC4R Variants Show Signaling Bias and Protect against Obesity. Cell. 2019 Apr 18;177(3):597-607.e9.

6. Khera AV, Chaffin M, Wade KH, Zahid S, Brancale J, Xia R, Distefano M, Senol-Cosar O, Haas ME, Bick A, Aragam KG, Lander ES, Smith GD, Mason-Suares H, Fornage M, Lebo M, Timpson NJ, Kaplan LM, Kathiresan S. Polygenic Prediction of Weight and Obesity Trajectories from Birth to Adulthood. Cell. 2019 Apr 18;177(3):587-596.e9.

7. Johnson EC, St Pierre CL, Meyers J, Aliev F, McCutcheon VV, Lai D, Dick DM, Goate AM, Kramer J, Kuperman S, Nurnberger JI Jr, Schuckit MA, Porjesz B, Edenberg HJ, Bucholz KK, Agrawal A. The genetic relationship between alcohol consumption and aspects of problem drinking in an ascertained sample. Alcohol Clin Exp Res. 2019;43(6):1113‐1125.

8. Kendall KM, Rees E, Bracher-Smith M, Legge S, Riglin L, Zammit S,  O'Donovan MC, Owen MJ, Jones I, Kirov G, Walters JTR. Association of Rare Copy Number Variants With Risk of Depression. JAMA Psychiatry. 2019 Apr 17. [Epub ahead of print] 

9. Bradbury KE, Murphy N, Key TJ. Diet and colorectal cancer in UK Biobank: a prospective study. Int J Epidemiol. 2019 Apr 17. pii: dyz064. 

10. de Kovel CGF, Francks C. The molecular genetics of hand preference revisited. Sci Rep. 2019 Apr 12;9(1):5986.
 

Database Contact
Database Contact Data

Professor Sir Rory Collins
Principal Investigator at UK Biobank
Head of Nuffield Department of Population Health and BHF Professor of Medicine and Epidemiology
UNITED KINGDOM
Phone: +44 (0)1865 743743 
Fax: +44 (0)1865 743985
Email: rory.collins@ndph.ox.ac.uk

Alternate Contact

Dr. Naomi Allen
Senior Epidemiologist at UK Biobank
Associate Professor in Epidemiology, UK Biobank, Nuffield Department of Population Health
UNITED KINGDOM
Phone: +44 (0)1865 743805
E-mail: naomi.allen@ndph.ox.ac.uk

References of Studies Using/Describing Database

1. Dekkers IA, Jansen PR, Lamb HJ. Obesity, Brain Volume, and White Matter Microstructure at MRI: A Cross-sectional UK Biobank Study. Radiology. 2019 Apr 23:181012.

2. Hwang LD, Lin C, Gharahkhani P, Cuellar-Partida G, Ong JS, An Gordon SD, Zhu G, MacGregor S, Lawlor DA, Breslin PAS, Wright MJ, Martin NG, Reed DR. New insight into human sweet taste: a genome-wide association study of the perception and intake of sweet substances. Am J Clin Nutr. 2019 Apr 21. pii: nqz043.

3. Machado-Fragua MD, Struijk EA, Ballesteros JM, Ortolá R, Rodriguez Artalejo F, Lopez-Garcia E. Habitual coffee consumption and risk of falls in 2 European cohorts of older adults. Am J Clin Nutr. 2019 Apr 21. pii: nqy369.

4. Hajna S, White T, Panter J, Brage S, Wijndaele K, Woodcock J, Ogilvie D, Imamura F, Griffin SJ. Driving status, travel modes and accelerometer assessed physical activity in younger, middle-aged and older adults: a prospective study of 90 810 UK Biobank participants. Int J Epidemiol. 2019 Apr 19. pii: dyz065.

5. Lotta LA, Mokrosiński J, Mendes de Oliveira E, Li C, Sharp SJ, Luan J, Brouwers B, Ayinampudi V, Bowker N, Kerrison N, Kaimakis V, Hoult D, Stewart ID,  Wheeler E, Day FR, Perry JRB, Langenberg C, Wareham NJ, Farooqi IS. Human Gain-of-Function MC4R Variants Show Signaling Bias and Protect against Obesity. Cell. 2019 Apr 18;177(3):597-607.e9.

6. Khera AV, Chaffin M, Wade KH, Zahid S, Brancale J, Xia R, Distefano M, Senol-Cosar O, Haas ME, Bick A, Aragam KG, Lander ES, Smith GD, Mason-Suares H, Fornage M, Lebo M, Timpson NJ, Kaplan LM, Kathiresan S. Polygenic Prediction of Weight and Obesity Trajectories from Birth to Adulthood. Cell. 2019 Apr 18;177(3):587-596.e9.

7. Johnson EC, St Pierre CL, Meyers J, Aliev F, McCutcheon VV, Lai D, Dick DM, Goate AM, Kramer J, Kuperman S, Nurnberger JI Jr, Schuckit MA, Porjesz B, Edenberg HJ, Bucholz KK, Agrawal A. The genetic relationship between alcohol consumption and aspects of problem drinking in an ascertained sample. Alcohol Clin Exp Res. 2019;43(6):1113‐1125.

8. Kendall KM, Rees E, Bracher-Smith M, Legge S, Riglin L, Zammit S,  O'Donovan MC, Owen MJ, Jones I, Kirov G, Walters JTR. Association of Rare Copy Number Variants With Risk of Depression. JAMA Psychiatry. 2019 Apr 17. [Epub ahead of print] 

9. Bradbury KE, Murphy N, Key TJ. Diet and colorectal cancer in UK Biobank: a prospective study. Int J Epidemiol. 2019 Apr 17. pii: dyz064. 

10. de Kovel CGF, Francks C. The molecular genetics of hand preference revisited. Sci Rep. 2019 Apr 12;9(1):5986.