Clinical Practice Research Datalink (CPRD) (formerly General Practice Research Database - GPRD) (United Kingdom)

Field Names
Records
Coordinating Country
United Kingdom
Region

United Kingdom for primary care data; England for other linked datasets

Brief Database Description

Clinical Practice Research Datalink (CPRD) is a governmental, not-for-profit, observational and interventional research service. CPRD is jointly funded by the NHS National Institute for Health Research (NIHR) and the Medicines and Healthcare products Regulatory Agency (MHRA). CPRD [formerly the General Practice Research Database (GPRD)] has been collecting primary care data since 1987, providing over 25 years of longitudinal data for public health research. CPRD collects the anonymised patient medical records from general practitioner (GP) practices using two of the three major GP software systems (Vision and EMIS) throughout the UK. As of March 2017, CPRD holds data on >24 million patients from >800 GP practices. (NOTE: In the UK health care system, GPs play a gatekeeper role as they are responsible for primary health care and specialist referrals.) CPRD data include not only Electronic Healthcare Records (EHR) from GPs, but also links to a number of secondary care datasets, including Hospital Episodes Statistics (HES), Cancer Registry, Death Registration data and more. CPRD also enables additional information to be requested from GPs and patients, and can support a range of interventional studies.

 
Primary care data are available online via CPRD GOLD for licence holders. Disease and drug coding dictionaries are provided, and a query tool that DEFINES patient cohorts. An EXTRACT tool then enables cuts of the data for a list of identified patients.

The CPRD Research Team extracts datasets for researchers against an appoved protocol and data specification. The specification is agreed upon with the researcher prior to generation of the data set.

CPRD services will develop incrementally over time and in addition to increasing the population cover of primary care data and number of linked datasets, CPRD will link to data from other domains such as Social Care.

Database Type
Longitudinal Population Database
- Drug and Diagnosis Data
- - - Outpatient and inpatient
- - Electronic Medical Records

[Full primary care electronic health records are available linked with hospital and other data including HES, Accident and Emergency data, Cancer Registry, and Office of National Statistics (ONS) death registration data.

CPRD data contains coded patient registration information and all care events that a general practitioner (GP) has chosen to record as part of their usual medical practice. Information held includes records of clinical events (medical diagnoses), referrals to specialists and secondary care settings, prescriptions issued in primary care, records of immunisations/vaccinations, diagnostic testing, lifestyle information (e.g., smoking and alcohol status), and all other types of care administered as part of routine GP practice.

CPRD has full access to HES data. HES data are made available as separate modules of hospitalised care, outpatient visits (visiting a consultant), maternity care and augmented/critical care. For HES APC (Admitted Patient Care) data source, each patient has a line of data for each "consultant" episode of care; this is best understood as a line of data for each ward in which the patient is treated. Diagnosis data recorded in HES APC are based on ICD-10 clinical coding and OPCS4 procedural coding. Health Resource Group (HRG) coding, the currency used in secondary care to support standardised health care commissioning, can also be used to support hospital resource utilisation analyses.

The CPRD data team uses data quality metrics. CPRD also has access to the additional text-based information from Primary Care practices, which can be accessed once anonymised.

Primary care data have been enhanced by the addition of central mortality data (date and causes of death) as well as certain key data from HES (HES- hospitalised patients). There will be an incremental approach to increasing the population coverage of primary care data to come from all four main GP Electronic Health Record (EHR)-IT systems.

Access to all CPRD primary care data will be available for tracking new products under a Risk Management protocol. NOTE: All access and use of data via the CPRD are carefully controlled under United Kingdom and European law and the rules and regulations operating in the NHS.]

Database Source
EHR/EMR

[Electronic Medical Records (primary care)
Death registration (from death certificate)
Other: Medical Records (from hospital, via audit dataset); Cancer records (from registry)
NOTE: Further expansion is ongoing to include other linked data sources.]

Frequency of Data Collection
Ongoing

(Daily or monthly)

Frequency of Data Update
Ongoing

(Monthly)

Years Covered
1987 - Present
Population Type
General Population

(Not restricted to a specific group, with linkage available to other population types, e.g., outpatient / inpatient / disease registries)

Patient Type
Inpatient and Outpatient

Other

[Primary care / GP Outpatients; can be linked to HES data for inpatient and outpatient information, and Emergency Room (Accident and Emergency) information].

Date of Last Update

(CPRD is updated on a monthly basis for primary care data, but varies for other linked data sets;
The CPRD profile was last updated for the B.R.I.D.G.E. TO DATA site on June 23, 2017.)

Field Names
Records
Database Population Size
20 - 50 Million

(24,310,960 - As of April 2017 >15 million patients from Vision practices and >7 million patients from EMIS practices.
NOTE: Vision and EMIS are software systems that allow data collection from GP practices.)

Active Population Size
5 - 20 Million

(5,958,580 million are active patients in the primary care database, as of April 2017)

Annual Change in Population
Varies

(Recruitment of new GP practices is ongoing. The primary care database is dynamic, meaning that patients can join and leave at any time.)

Sample Weights - Extrapolation Factors
No

CPRD GOLD is generalisable to the United Kingdom population and the subset of patients eligible for linkage have been shown to represent the rest of the database in general.

See references:
1. Gallagher AM, Puri S, van Staa TP: Linkage of the General Practice Research Database (GPRD) with other data sources. Pharmacoepidemiol Drug Saf 2011, 20:S230.

2. Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, et al. Data Resource Profile: Clinical Practice Research Datalink (CPRD). Int J Epidemiol [Internet]. 2015 Jun 6 [cited 2015 Jun 7];44(3):827–36. Available from: http://www.ncbi.nlm.nih.gov/pubmed/26050254

Final Population Size
N/A

(Not applicable, as data collection is ongoing)

Field Names
Records
Age of Patients at Data Collection
Yes

YOB
(Year and Month of Birth for children)

Approximate Percentage of Participants <18 years and those >65 years

<18 yrs = 19.9%
>65 yrs = 17.7%

(Based on March 2011 UK census data)

Gender Data
Yes
Percentage of Males/Females

Females = 50.5%
Males = 49.5%

(Based on March 2011 UK census data)

Ethnicity / Race Data
Yes

Now recorded by GPs and available for all hospitalisations and other linked data (but not yet 100% from GPs); however, there is a variation in the standard of recording across practices.

Geographic Location

United Kingdom (primary care)

England for linked data sets.
CPRD collaborates with equivalent services in Scotland and Wales.

Date of Birth Recorded
Yes

DOB is recorded as year of birth for adults; and year / month of birth can be provided for a study related to children. CPRD can also estimate date of birth for newborns through the Mother-Baby linkage.

Death Recorded
Yes

However date of death and coded causes of death are available only via linkage to the ONS death registration data. Linkage to ONS death registration data is available for patients registered to English practices for further information.

Availability of death certificate / autopsy information
Yes

Coded causes of death are derived from the death registration process available via ONS death registration linkage data

Other Demographic Data
Yes

Townsend and Index of Multiple Deprivation scores (area level deprivation indices) are available

Field Names
Records
Physician ID
Yes

(Pseudonymised/Privatised)

Physician Specialty
Yes

All are GPs; however, referrals to specialists are noted. Physician speciality is available in HES data.

Pharmacy ID
N/A

(Not applicable, as pharmacy data are not collected)

Field Names
Records
Diagnosis Data
Yes

Information includes records of symptoms, diagnoses, immunisations/vaccinations, diagnostic testing, lifestyle information (e.g., smoking and alcohol status), and all other types of care administered as part of routine GP practice.

Diagnoses Coded
READ

ICD-9
ICD-10
SNOMED
Other

[ICD-10 in linked data sets (e.g., death and hospitalisation) Death registry = ICD-10 (ICD-9 prior to 2001),
EMIS = Read CV3 clinical codes and SNOMED clinical codes
HES APC = ICD-10
HES A&E = Internal coding system
HES Outpatient = Diagnostic data incomplete
Vision = Read v2 clinical codes]

Diagnoses: Date Parameters
1987 - Present

NOTE: Nearly 30% of patients have had their 1987 paper record culled into electronic format

Diagnoses: Maximum Number of Codes Allowed
Unlimited

GPs are not limited in the number of symptom or diagnostic codes that can be recorded for primary care (outpatients).
For Hospital data (Hospital Epsiode Statistics):
Inpatient is limited to 20 diagnoses per episode (14 before April 2007, and 7 before April 2002);
Outpatient is limited to 12 diagnoses (2 before April 2007), but they are rarely populated.

Physical Examination Findings
Yes

Physical examination data are captured in a structured form which can be analysed using an entity code to reflect the attribute being recorded and one or more values to reflect the attribute values. Examples include blood pressure and BMI - body mass index.

Birth Defect Data
Yes
Cancer Data
Yes

Diagnostic codes for cancer are available in the primary care record. For more detailed information, cancer data are available as a linked data set to the Cancer Registry, which contains Primary Site (ICD-10), Morphology, Laterality, Grade, Stage, Tumour size, Growth behaviour, TNM Staging, boolean indication of types of treatment.

Infectious Disease Data
Yes

Data are available for laboratory tests ordered by GPs.

Environmental Exposures
Yes

Please be aware that this information is generally incomplete, and cannot be reliably used for research.

Behavioral Data Elements
Yes

Lifestyle information such as details on smoking and alcohol consumption is included

Field Names
Records
Procedure Data
Yes

Procedure data are included if performed at the primary care practice or hospital. Hospital data are from the HES (via OPCS procedure codes).

Procedures Coded
ICD-10
READ
Other

SNOMED

[Death Registry = ICD10 (ICD9 prior to 2001)
EMIS = Read CV3 clinical codes and SNOMED clinical codes
HES APC = ICD-10,
HES A&E = Internal coding system
HES Outpatient = Diagnostic data incomplete
Vision = Read v2 clinical codes]

Number of Procedures Coded
Unlimited

[Primary Care: Unlimited

Hospital data (Hospital Epsiode Statistics):
- Inpatient - There are 24 fields (12 before April 2007 and 4 prior to April 2002);
Outpatient - There are 24 fields (12 before April 2007).]

Procedure Date Parameters
1987 - Present

[1987 - Present (Primary Care)
1997 - Within one calendar year (Hospital Inpatient)
2003 - Within one calendar year (Hospital Outpatient)
NOTE: Capture of procedure codes is variable in the linked outpatient data.]

Laboratory Information
Yes

Test results that include qualitative and quantitative values are recorded. CPRD can also arrange to collect bio-samples from patients that can be linked with CPRD observational and specifically collected data. Test results from primary care that include qualitative and quantitative values are recorded.
Examples of types of laboratory values captured in CPRD include biochemistry, hematology, serology, immunology, pathology, and respiratory function data.

Field Names
Records
Drug Data
Yes: Prescription only

CPRD contains details of all drugs prescribed by the GP – generics and/or branded products issued in primary care. Information on formulation, strength and dosing instructions is also available.

Drug Date Parameters
1987 - Present
Drug Regimen & Route
Yes
Drug Manufacturer
Yes

Only some information is available on drug manufacturers as drugs are predominantly prescribed generically in UK healthcare

Drug Dosage
Yes

Information on formulation, strength and dosing instructions are available

Drug Days Supply
Yes

This can be calculated using the daily dose and duration fields. However, there are limited data on this.

Drug Coding System: Maximum Number
Unlimited
Drug Coding System: Primary
Other

(Gemscript)

Drug Coding System: Other
ATC

(Mapping to ATC is available)

Drug Generic Name
Yes
Drug Additional Information
Yes

Adverse drug reaction details are recorded, including certainty and severity assessments. There are 4 potential sources of drug cost data that can be incorporated in CPRD data - See Type of Cost Data under ECONOMIC DATA for more details.

Field Names
Records
Cost Data
Yes

However, this information is not directly available in the patient health record. Surrogate sources of costs data are available nationally. The provision of health care in the United Kingdom National Health Service (NHS) is funded through a direct taxation system. Under this arrangement, all consumers have access to varying levels of health care at zero prices, at the point of delivery. As a result of this form of financing, unit cost of hospital and medical care provisions are not captured at the individual level by NHS.

Information on hospital costs and other services can also be obtained from volumes of "Unit Costs of Health & Social Care" (available at: http://www.pssru.ac.uk/project-pages/unit-costs/). This provides details of unit cost, average length of stay, and activity levels of a wide range of hospital services and describes how and on what NHS expenditure is used. However, such information has only become available in more recent years.

Cost Denomination
Sterling Pounds
Type of Cost Data
Yes

However, this information is not directly available in the patient health record. Surrogate sources of costs data are available nationally. The provison of health care in the United Kingdom National Health Service (NHS) is funded through a direct taxation system. Under this arrangement, all consumers have access to varying levels of health care at zero prices, at the point of delivery. As a result of this form of financing, unit cost of hospital and medical care provisions are not captured at the individual level in the United Kingdom NHS.

Description of Surrogate Link
Yes

Sources of costs data for care provided in the community and/or hospital (as mentioned above) setting are publicly available and are free of charge. Prescrption costs data from the PSU are also publicly available. Access to data from the Drug tariff and dm+d is subject to subscription.

The main source of cost data for services provided in the Community can be obtained from publication of the ‘Unit Costs of Health & Social Care’, prepared by the Personal Social Services Research Unit (PSSRU). The volumes present unit costs for a range of activities undertaken in the Community and more recently for some hospital activities. Volumes are published on an annual basis. The main source of cost data for services provided in secondary care can be obtained from publication of the ‘NHS Reference cost schedule’, prepared by the Department of Health. The concept of the Reference Costs was introduced by the United Kingdom government in 1998 (NHSE 1998b) and was intended to summarise the efficiency of the hospital sector and, when aggregated over all the providers used by a Health Authority, to assess the efficiency of the Health Authority itself. Reference costs are based on costed Healthcare Resource Groups (HRGs). In their most basic form HRGs are groups of ICD-10 diagnoses and OPCS procedures that have similar resource implications. HRGs are currently used as a means of determining fair and equitable reimbursement for care services delivered by providers.

Field Names
Records
Data Validation Against Original Source
Yes

CPRD data are too extensive to be systematically validated. Validation services are an additional service provided by CPRD, but would be targeted to selected records of most interest. Validity studies in CPRD data include:

1. Dregan A, Moller H, Murray-Thomas T, Gulliford MC. Validity of cancer diagnosis in a primary care database compared with linked cancer registrations in England. Population-based cohort study. Cancer Epidemiol. Elsevier Ltd; 2012 Oct;36(5):425–9.
2. Khan NF, Harrison SE, Rose PW. Validity of diagnostic coding within the General Practice Research Database: a systematic review. Br J Gen Pract. 2010 Mar; 60(572):e128-36.
3. Herrett E, Thomas SL, Schoonen WM, Smeeth L, Hall AJ. Validation and validity of diagnoses in the General Practice Research Database: a systematic review. Br J Clin Pharmacol. 2010 Jan;69(1):4–14.

Access to Medical Records
No

Medical record access is not available for end-users of the database. However, vendors can access records on end-users' behalf - for outpatients, and where applicable, hospitalization records in some cases.

Linkage to Other Databases
Yes

Direct linkage is based primarily on the unique NHS patient identifier (NHS Number)

Brief Description of Linkage Capabilities

CPRD currently has established linkages to several data sources. - Hospital Episodes Statistics (HES) data , including: -- HES Inpatient Admitted Patient Care (APC) data -- HES Outpatient (HES OP) data -- HES Accident and Emergency (HES A&E) data -- HES Diagnostic Imaging Dataset (HES DID) -- HES Patient Reported Outcomes Measure Data (PROMs) - Death Registration data from Office of National Statistics (ONS) - Cancer Registration data from Public Health England (PHE) - Socio-economic classifications, such as measures of relative deprivation at Lower Layer Super Output Area (LSOA) level, based on patient and practice postcode. CPRD is working on extending this list primarily in the area of cancer treatment and disease registries.

Field Names
Records
Database Contact Data

CPRD Enquiries 
The Clinical Practice Research Datalink 
The Medicines and Healthcare products Regulatory Agency 
5th Floor
151 Buckingham Palace Road 
Victoria London SW1W 9SZ
ENGLAND 
Telephone: +44 (0)20 3080 6383
Email: enquiries@cprd.com

Alternate Contact

N/A

(Not applicable)

Source of Database Funding
Government
Private

(Mixture of Government funding and (Private) funding through income from customer fees)

Sponsoring Government Agency
NHS National Institute for Health Research (NIHR) and the Medicines and Healthcare products Regulatory Agency (MHRA)
Sponsoring Pharmaceutical Manufacturer

N/A

(Not applicable)

Database Usage Restrictions
Private Access

Access to CPRD data is granted for public health research, provided organisations have research staff capable of analysing CPRD data. Access is controlled through contractual obligations. All studies have to be approved by an Independent Scientific Advisory Committee (ISAC).

Charge for Database Usage
Yes

Charges can be per individual study or on an annual license fee basis; The license fee is based on cost recovery of staff time and use of specific IT systems and data charges. Please contact CPRD for more information (enquiries@cprd.com).

Data Media Format
Other

(Access to GOLD data is via a secure online system. Datasets are encrypted and delivered using SFTP.)

Number of Publications Using Database
>1700 peer-reviewed articles
References of Studies Using/Describing Database

1. Franklin M, Davis S, Horspool M, Kua WS, Julious S. Economic Evaluations Alongside Efficient Study Designs Using Large Observational Datasets: the PLEASANT Trial Case Study. Pharmacoeconomics. 2017 May;35(5):561-573.

2. Charlton RA, McGrogan A, Snowball J, Yates LM, Wood A, Clayton-Smith J, Smithson WH, Richardson JL, McHugh N, Thomas SH, Baker GA, Bromley R. Sensitivity of the UK Clinical Practice Research Datalink to Detect Neurodevelopmental Effects of Medicine Exposure in Utero: Comparative Analysis of an Antiepileptic Drug-Exposed Cohort. Drug Saf. 2017 May;40(5):387-397.

3. Iwagami M, Tomlinson LA, Mansfield KE, Casula A, Caskey FJ, Aitken G, Fraser SDS, Roderick PJ, Nitsch D. Validity of estimated prevalence of decreased kidney function and renal replacement therapy from primary care electronic health records compared with national survey and registry data in the United Kingdom. Nephrol Dial Transplant. 2017 Apr 1;32(suppl_2):ii142-ii150.

4. Leite A, Andrews NJ, Thomas SL. Assessing recording delays in general practice records to inform near real-time vaccine safety surveillance using the Clinical Practice Research Datalink (CPRD). Pharmacoepidemiol Drug Saf. 2017 Apr;26(4):437-445.

5. Thompson A, Wright AK, Ashcroft DM, van Staa TP, Pirmohamed M. Epidemiology of alcohol dependence in UK primary care: Results from a large observational study using the Clinical Practice Research Datalink. PLoS One. 2017 Mar 31;12(3):e0174818.

6. Little I, Vinogradova Y, Orton E, Kai J, Qureshi N. Venous thromboembolism in adults screened for sickle cell trait: a population-based cohort study with nested case-control analysis. BMJ Open. 2017 Mar 29;7(3):e012665.

7. Oyinlola JO, Campbell J, Kousoulis AA. Is real world evidence influencing practice? A systematic review of CPRD research in NICE guidances. BMC health services research 2016 Jul 26;16(1):299.

8. Gallagher AM, Williams T, Leufkens HGM, de Vries F. The Impact of the Choice of Data Source in Record Linkage Studies Estimating Mortality in Venous Thromboembolism. PloS one 2016 Feb 10;11(2):e0148349.

9. Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, Smeeth L. Data Resource Profile: Clinical Practice Research Datalink (CPRD). Int J Epidemiol. 2015 Jun 6;44(3):827–36.

10. Herrett E, Thomas SL, Schoonen WM, Smeeth L, Hall AJ. Validation and validity of diagnoses in the General Practice Research Database: a systematic review. Br J Clin Pharmacol. 2010 Jan;69(1):4–14.

Database Contact
Database Contact Data

CPRD Enquiries 
The Clinical Practice Research Datalink 
The Medicines and Healthcare products Regulatory Agency 
5th Floor
151 Buckingham Palace Road 
Victoria London SW1W 9SZ
ENGLAND 
Telephone: +44 (0)20 3080 6383
Email: enquiries@cprd.com

Alternate Contact

N/A

References of Studies Using/Describing Database

1. Franklin M, Davis S, Horspool M, Kua WS, Julious S. Economic Evaluations Alongside Efficient Study Designs Using Large Observational Datasets: the PLEASANT Trial Case Study. Pharmacoeconomics. 2017 May;35(5):561-573.

2. Charlton RA, McGrogan A, Snowball J, Yates LM, Wood A, Clayton-Smith J, Smithson WH, Richardson JL, McHugh N, Thomas SH, Baker GA, Bromley R. Sensitivity of the UK Clinical Practice Research Datalink to Detect Neurodevelopmental Effects of Medicine Exposure in Utero: Comparative Analysis of an Antiepileptic Drug-Exposed Cohort. Drug Saf. 2017 May;40(5):387-397.

3. Iwagami M, Tomlinson LA, Mansfield KE, Casula A, Caskey FJ, Aitken G, Fraser SDS, Roderick PJ, Nitsch D. Validity of estimated prevalence of decreased kidney function and renal replacement therapy from primary care electronic health records compared with national survey and registry data in the United Kingdom. Nephrol Dial Transplant. 2017 Apr 1;32(suppl_2):ii142-ii150.

4. Leite A, Andrews NJ, Thomas SL. Assessing recording delays in general practice records to inform near real-time vaccine safety surveillance using the Clinical Practice Research Datalink (CPRD). Pharmacoepidemiol Drug Saf. 2017 Apr;26(4):437-445.

5. Thompson A, Wright AK, Ashcroft DM, van Staa TP, Pirmohamed M. Epidemiology of alcohol dependence in UK primary care: Results from a large observational study using the Clinical Practice Research Datalink. PLoS One. 2017 Mar 31;12(3):e0174818.

6. Little I, Vinogradova Y, Orton E, Kai J, Qureshi N. Venous thromboembolism in adults screened for sickle cell trait: a population-based cohort study with nested case-control analysis. BMJ Open. 2017 Mar 29;7(3):e012665.

7. Oyinlola JO, Campbell J, Kousoulis AA. Is real world evidence influencing practice? A systematic review of CPRD research in NICE guidances. BMC health services research 2016 Jul 26;16(1):299.

8. Gallagher AM, Williams T, Leufkens HGM, de Vries F. The Impact of the Choice of Data Source in Record Linkage Studies Estimating Mortality in Venous Thromboembolism. PloS one 2016 Feb 10;11(2):e0148349.

9. Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, Smeeth L. Data Resource Profile: Clinical Practice Research Datalink (CPRD). Int J Epidemiol. 2015 Jun 6;44(3):827–36.

10. Herrett E, Thomas SL, Schoonen WM, Smeeth L, Hall AJ. Validation and validity of diagnoses in the General Practice Research Database: a systematic review. Br J Clin Pharmacol. 2010 Jan;69(1):4–14.