Data Contents Overview

The study has conducted seven waves of data collection, spanning from 1998 through 2024.  This page provides a brief overview of data available through the Public Use and Restricted Use Contract Data processes.

You can also use the menu to the left to navigate to pages including more detailed information on Public Data Documentation and the Restricted Use Contract Data.

Public Data

The Baseline wave of data collection took place from 1998 to 2000 and includes mother and father core interviews at the birth of the study's "focal child."   These interviews were conducted primarily in the hospital shortly following the focal child's birth.  

At Baseline and the subsequent six waves, the core interviews collected data on parental relationships, parenting, health and health behaviors, family and social support, demographics, housing, use of social programs, and education and employment.  

The Year 1 follow-up wave of data collection took place from 1999 to 2001 and includes mother and father core interviews around the time of the focal child's first birthday.   

The Year 3 (2001-2003) and Year 5 (2003-2006) follow-up waves included mother and father core interviews, as well as primary caregiver interviews and home visits around the time of the focal child's third and fifth birthdays.   The primary caregiver interviews included questions on home life and routines, health and health care, and parenting.  During the home visits, assessments such as the Peabody Picture Vocabulary Test (PPVT) and direct height and weight measurements were given. Interviewers observed the home environment (surrounding neighborhood, interior and exterior of house/apartment) and recorded additional information about the parent and child's affect during the home visit.   

The Year 9 (2007-2010) follow-up wave included mother and father core interviews, as well as primary caregiver interviews, home visits, and interviewer observations, similar to the previous two waves.  We conducted a short interview with the focal child around their ninth birthdays, collecting information on their relationships with parents and siblings, school connectedness, task completion, self-concept, and home routines. During these in-home assessments, saliva samples were collected from focal children and their biological mother to create Biomarker data (Telomere Length, DNA Methylation Clocks) and Polygenic Scores (PGS). 

The Year 15 (2014 to 2017) follow-up wave included primary caregiver and teen interviews (mostly conducted by phone), and home visit activities and interviewer observations were conducted with a subset of ~1,000 teens. In addition to recollecting data on the topics covered throughout the previous five waves, the Year 15 interviews included new measures on focal children’s education and school experiences, risky behaviors such as sexual activity and substance use, peer interactions, and pro-social behaviors. The home visits included height/weight/waist circumference measurements, and interviewer observations of the home environment. Teens who participated in the In-Home Study were also invited to participate in a Sleep Study and were asked to wear an accelerometer on their non-dominant wrist for seven consecutive days to track their sleep, in addition to the completion of daily diary entries. A sample of teens participating in the Year-15 in-home study additionally completed The mDiary Study of Adolescent Relationships (mDiary), a year-long study examining the romantic and sexual relationships of teens.

The Year 22 (2020-2024) follow-up wave included a survey with the “focal child” as a young adult as well as with the person who was the Primary Caregiver (PCG) at Age 15. Data on the young adults included questions on a wide range of topics including housing, education, employment, income, assistance, finances, relationships, family formation, systems involvement, identity, health and behavior, and substance use. PCGs we asked questions about their own housing, education, employment, income, assistance, finances, relationships, identity, health and behavior, and substance use and about the young adult’s housing, education, system’s involvement, health, relationships, and employment. Public data at Year 22 also include survey variables from a collaborative COVID-19 sub-study, conducted by FFCWS partners at the University of Michigan. In this sub-study, a subset of FFCWS primary caregiver and young adult participants completed an online survey evaluating whether COVID-19 impacted their mental health and social support systems. More specifically, this data assessed COVID-19 related stressors, moderators of these stressors, and the impact of these stressors on participant’s threat and reward construct.

Data files containing measures from the interviews, home visits, and observations are available for download through the OPR Data Archive.

Restricted-Use Contract Data

Additional files are available through our Restricted-Use Contract data application process. These are described below, and more information can be found here

Residential context files, including:

A Geographic Identifiers file, with the focal child's birth city, mother's and father's state of residence at each interview,  and stratum and psu.

A Census Tract Measures file, with pseudo-tract identifiers and data on demographic, housing, and income characteristics from the U.S. Census for the census tracts of mothers’ and fathers’ residence.

A Labor Market and Macroeconomic file, with data on local employment and national consumer confidence.

An Opportunity Insights file, with county-level measures of intergenerational mobility and characteristics correlated with intergenerational mobility from Chetty and Hendren's Opportunity Insights.

A Gun Violence Archive file, with incident-level data on time and location of gun violence in 2014 to 2017 from the Gun Violence Archive.

A Uniform Crime Reports file, with county-level crime rates (counts/county population) for all crimes, violent crimes, and property crimes from the FBI Unified Crime Reports database.

School context files including: 

NCES - School-level characteristics including school type, pupil-to-teacher ratio, school’s racial composition, Title I funding, percent of students receiving free and reduced price lunch, and more from the National Center for Education Statistics Common Core

SEDA - School district level measures from 2009 to 2013 of educational inequality and characteristics correlated with educational inequality from the Stanford Educational Data Archive.

CRDC - School- and school district-level measures from 2009, 2011, and 2013 of school resources, disciplinary outcomes, and other characteristics related to school environments from the Civil Rights Data Collection.

Biological and Health files, including:

Medical Records data for mothers and children from the birth hospitalization record.

A FFCWS Candidate Genes appendage based on the saliva samples collected from mothers and children at the Year 9 wave.

A Genotype Array file, with genotype data on focal children. This genetic information can be linked to a limited subset of phenotype variables. To download this file, please request access through the dbGaP data archive here. You do not need to complete a restricted data application with CRCFW. (Note: This file cannot be merged to other restricted use contract data or to the public FFCWS data.)

FF Challenge Files

The FF Challenge data files are associated with the predictive modeling stage of the FF Challenge competition, held in Summer 2017. These files are now being provided so that other data users may replicate and extend what participants did in the Challenge.

The Challenge files include:
-    readme.txt – a text file with descriptions of the remaining files
-    background.csv - birth-Year 9 data, as a .csv
-    background.dta - birth-Year 9 data, as a Stata .dta file
-    codebook_FFChallenge.txt - merged codebook for all waves
-    prediction.csv - an example submission that predicts the mean of the training data for all outcomes
-    train.csv - outcomes for training observations (half the sample)
-    test.csv - outcomes for test observations
-    leaderboard.csv - outcomes for observations in the leaderboard set, with missing outcomes imputed (as provided via Codalab)
-    leaderboardUnfilled.csv - outcomes for observations in the leaderboard set (not imputed)

Data files from the FF Challenge project are available for download through the the OPR Data Archive. For more detail on what's available, please visit the Fragile Families Challenge blog.