Frequently Asked Questions

General Questions

Study Design Questions

Data Structure Questions

Technical Questions

Weights Questions

Metadata Explorer Questions

Publishing Your FFCWS Research


General Questions

Why did you change the name of FFCWS?

In January 2023, The Fragile Families and Child Wellbeing Study became The Future of Families and Child Wellbeing Study. We still use the same acronym, FFCWS.

Since beginning in 1998, the focus of FFCWS has evolved and expanded in exciting ways. The decision to change our name honors the significance of this history as well as an ongoing commitment to the wellbeing of children, youth, and families. 

Data users should note that documentation and collaborative projects completed prior to January 2023 contain the study’s former name. Any further reference to FFCWS should kindly observe this name change.

 What data are available to the public?

Currently, Baseline, Year 1, Year 3, Year 5, Year 9, and Year 15 survey data are available to the public through the Office of Population Research data archive. Year 3, Year 5, Year 9, and Year 15 in-home data are also included in these files, as well as child care provider (Year 3) and teacher surveys (Years 5 and 9), for a subset of core respondents. Biomarker and Polygenic Score data are also included in the public data based on saliva collected from the focal child and mother at Year 9 and/or Year 15. Additional files including Geographic identifiers, Census tract measures, Labor market and macroeconomic, Opportunity Insights, Gun Violence, Uniform Crime Reports, NCES, SEDA, CRDC, Medical records, and Genotype array are available to the public via a Restricted Use Contract.

When will the Year 22 data and questionnaires be available to the public?

The data collection for Year 22 began in the Fall of 2020 and will continue through late 2022. Data from the Year 22 survey will be made available to researchers after the completion of fieldwork, with a TBA launch date in 2024. If you would like to be informed when new data are released, please sign up for the FFCWS newsletter. The Year 22 survey documents are not publicly available at this time but may be accessed upon request. Any researcher seeking access to the Year 22 surveys should contact [email protected].

How can I access FFCWS data for my analysis?

There is a two-step process to access the data: (1) register as a user of Princeton University's Office of Population Research data archive, and (2) sign up for access to the FFCWS within the data archive. Registering as a user of the archive is immediate and automated. Signing up for the FFCWS data submits a request which is usually reviewed for approval within 1 business day.

Please note that after logging in, the OPR website occasionally times out and a FFCWS data request that was submitted by the user is not actually processed. After logging in to the archive, completing the data request within 10 minutes usually results in a successful submission. If you don't hear back about your approval status within 1 business day, email us at [email protected].

For detailed instructions on how to apply for the FFCWS public data, see our OPR archive tutorial.

Can I get access to city identifiers?

Geographic identifiers are only available through a restricted use contractual agreement. See the FFCWS Restricted Use Contract Data for more information.

What is the best way to view variable frequencies?

If you want to review frequencies before downloading the data, please use the the Metadata Explorer or review the codebooks available by wave on the Public Data Documentation page.

Where do I send questions about the data, procedures, problems, etc.?

Please email all questions about the data and documentation to [email protected].

Can I distribute the data from the FFCWS Public Use Files to my colleagues, even though they have not personally registered on the public use web site?

We ask that all users personally register in order to access the data files.

Why am I required to give contact information to register?

The FFCWS receives funding from a number of different sources. We ask each data user to register separately because we want to be able to provide our funders with information about data usage, such as the number of data users and what the data are being used for. Your contact information will not be used unless you ask to receive mailings about the data, study, etc.

How is the FFCWS sponsored and funded?

The FFCWS has been supported by a number of foundations and agencies. Click here to view the list of those who funded the core study.

I am interested in using FFCWS data for a specific research topic (ex. child support, incarceration, health care access). How do I figure out if relevant data is available?

You can use our Metadata Explorer to browse or search our database of the FFCWS variables by a variety of topics. You can use the Advanced Variable Search option to complete a text search using your own terms. It may also be helpful to look at the questionnaires directly, so you can see which other questions were asked on the topics relevant to your research. Questionnaires are available by wave on our Public Data Documentation page. You can also see what other researchers have published on a variety of topics with FFCWS data in our Publication Archive.

How can I merge additional data to the FFCWS data?

You can merge your own data with FFCWS by sample city (Baseline only) or state of residence (all waves) if you are approved for access to the Geographic Identifiers file which is available through our Restricted Use Contract Data process and CRCW staff have approved the data you plan to merge to FFCWS as part of your Contract Data Agreement. For information regarding requests for adding new contextual data to the Restricted Use Contract Data, please see the bottom of our What is Available page.

What training opportunities are available for FFCWS data users?

Most data users learn about the data by using the documentation available on our website. You may also use our Metadata Explorer to browse or search the database of the FFCWS Public Data variables.

The Getting Started page on the FFCWS website hosts a variety of helpful resources for new FFCWS data users. From this page, users can view the New Data User Tutorial, which provides an overview of how to download the FFCWS data as well as the FFCWS file structure, variable naming convention, missing data codes, and more. Users may additionally utilize the Introduction to the FFCWS Data video, a previously recorded training video from the FFCWS data team.

Early-career scholars interested in attending a live FFCWS training can apply for the annual FFCWS Summer Data Workshop at Columbia University. Learn more at

When you have specific questions, you’re welcome to email us at [email protected].

How can I find out about updates and news regarding FFCWS?

Follow our Twitter (@FFCWS) and sign up for our newsletter for regular updates about the data. We also post data alerts to the website when we release new information. Visit our homepage for information regarding upcoming conferences and other events. For more specific questions, email us at [email protected].

Study Design Questions

How did you choose your cities and hospitals?

A detailed description of our sample design is contained in Reichman et al 2001, "The Fragile Families and Child Wellbeing Study: Sample and Design" Children and Youth Services Review, 2001, Vol.23, No, 4/5. A brief summary and additional details on data collection and hospital protocols are included in the User Guide for each wave of data, which can be found on our Public Data Documentation page. 

How did you decide which mothers to interview when you were in the hospital?

Sampling Mothers - Mothers of new babies were sampled at each hospital from maternity ward lists. Once sampled, mothers were asked to complete a screening instrument to determine marital status and eligibility for participation in the study. Quotas were set at each hospital for number of unmarried and married births, based on sample cities’ 1996/1997 unmarried birth rates. If a mother was determined to be above the set quota for a given marital status, the case was coded “over quota” and the mother was not interviewed. Mothers’ eligibility was determined based on the analytic goals, logistical restraints and design of the study, including the need to interview both the mother and father of a child who would be residing with at least one of those parents. Thus, for instance, mothers whose babies would be adopted were considered “ineligible” and were not interviewed.
Sampling Fathers - Once a mother had been determined to be eligible, and had given her signed consent for participation, the baby’s father was also asked to participate in the study.

See the Sample and Design Paper.

Are the FFCWS data nationally representative?

National weights are available to make the data of 16 of the 20 cities representative of births in the 77 U.S. cities with populations over 200,000. See the weights documentation and Sample Design paper for extensive discussions of the weights and samples.

What are the response rates for each follow-up?

See the User Guide for each wave, located on the Public Data Documentation page.

Who is the primary caregiver (PCG)?

The mother was considered the primary caregiver if she lived with the child at least half of the time, which applies to the majority of families. If she did not, however, the PCG interview would have been conducted with the father or other adult who lived with the child at least half of the time. See the User Guides for information regarding constructed variables which indicate the specific relationship of the PCG to the child for each family/interview.

Data Structure Questions

How are the data files structured?

You may download all of the public data in one data file or you may download separate files for each wave of data collection. The data are structured as one record per child/family. There are records for all 4,898 families at each wave, regardless of whether they were interviewed. Data from each wave (if downloaded separately) can be merged using the idnum variable. Flag variables (e.g. cf1fint, cm2mint, cf2fint) indicate whether or not a mother/father was interviewed at a given wave (all mothers were interviewed at baseline so there is no cm1mint variable). Cases not interviewed are coded as -9 "Not in wave" on all other variables.

How can I tell if a question was asked in a given wave?

Questionnaires are available by wave on the Public Data Documentation page.  You can also look up questions by topic, text search, or variable name in the Metadata Explorer. When you click on a specific variable in the Metadata Explorer, a list of similar variables in other waves and surveys will be provided.

Where are the interview date variables?

In the baseline file there are two variables you should use to find out when the respondent was interviewed. m1intmon / f1intmon represent the month of interview, and m1intyr / f1intyr represent the year of interview. There is also a constructed variable (cm1tdiff) that can be used to check the time gap between parent interviews. There are corresponding variables at all waves.

What are the identifiers on the file?

The main identifier on the file for merging and sorting is the idnum, a 4-character string variable idnum can be found on all of the public data files and all Restricted Use Contract files with the exception of the Psych Array file (which uses a different id number).    

How do I know if a case was interviewed in a given wave?

Flag variables (e.g. cf1fint, cm2mint, cf2fint) indicate whether or not a given respondent was interviewed at a given wave (all mothers were interviewed at baseline so there is no cm1mint variable). Cases not interviewed are coded as -9 "not in wave" on all other variables. Flag variables (e.g. cm1fint, cm2fint and cf2mint) on the mothers' and fathers' records indicate whether the corresponding mother or father was interviewed at the time of the follow-up.  Additionally, variables with the "samp" root in the name (e.g. cm2samp and cf2samp) provide information about the status of the case at each follow-up. Information such as mother/father/child death between waves, nonresponse, and changes in eligibility are coded in these variables.

What do -5 and -6 mean?

"-5" in the data file means the person was not asked a given question because that question was not on the version of the questionnaire used at the time of the interview. "-6" means the respondent was skipped for a question that wasn't appropriate for them to answer. For more help with skipped questions, see “How do I figure out why a participant skipped a particular survey question?” For more information on negative codes, see the User Guides.

There are many skip patterns in the data and they can be complex. In order to understand a skip pattern, the best place to start is by 1) going to the applicable questionnaire, 2) finding the variable you are examining, and 3) working backwards from the variable of interest until you find the skip instruction(s) which initiates the skip pattern. Please note that some of the more complex interview segments may contain more than one skip pattern in a particular section so if the first one you see does not account for all of the cases that skip the variable of interest, you may need to look back further for an additional skip command. Questionnaires are available on our Public Data Documentation page.

What is the difference between the FF Challenge data file and traditional FFCWS data files?

The FF Challenge data files are associated with the predictive modeling stage of the FF Challenge competition, held in Summer 2017 [Read Measuring the predictability of life outcomes with a scientific mass collaboration]. These files are now being provided through Princeton’s OPR Data Archive so that other data users will be able to replicate and extend what participants did in the Challenge. If you are trying to identify which file you have downloaded, the Challenge files can most easily be identified in contrast to the traditional FFCWS data files by the unique IDs (integer values between 1 and 4,242) and number of observations (4,242) - the FFCWS data files have an idnum with four digits and 4,898 observations.

The Challenge files include:                         

  • readme.txt – a text file with descriptions
  • background.csv - birth-Year 9 data, as a .csv
  • background.dta - birth-Year 9 data, as a Stata .dta file
  • codebook_FFChallenge.txt - merged codebook for all waves
  • prediction.csv - an example submission that predicts the mean of the training data for all outcomes
  • train.csv - outcomes for training observations (half the sample)
  • test.csv - outcomes for test observations
  • leaderboard.csv - outcomes for observations in the leaderboard set, with missing outcomes imputed (as provided via Codalab)
  • leaderboardUnfilled.csv - outcomes for observations in the leaderboard set (not imputed)

Where is the mother/father Year 15 interview?

There are no mother or father interviews at Year 15, however the Year 15 primary caregiver (PCG) interview incorporates many of the questions and topics included in the mother and father interviews of previous waves as well as some of the questions and topics included in the PCG interviews of previous waves.

Where is the scales documentation?

The Scales and Concepts Documentation page shows a table with the scales and concepts included in each wave of FFCWS data. Each “x” in the table links to more detailed documentation within the User Guide for that particular wave, including source information, the full variable list for the scale or concept, modifications, and scoring instructions (if applicable). You can also filter variables by scale in the Metadata Explorer

What variables are available for the gender and sex of the Focal Child as well as the Primary Caregiver (PCG)?

In Wave 1 (baseline), the variable cm1sex identifies the sex of the focal child at birth. In Wave 5 (year 9), the variable n5d1 is the self-reported gender of the PCG if that person is a non-parental caregiver. Either of these variables may be linked with data from subsequent waves using the idnum key. In other waves, the variable c**pcgrel reports the relationship of the PCG to the child (e.g. bio mother, grandfather, etc.) and may be used to infer the gender of the respondent. To determine the variable naming conventions at each wave, please see the “Variable Structure” section on page 18 of the Year 15 User Guide.

Technical Questions

I am having trouble opening zip files with WinZip.

If you are having trouble downloading the files simply by clicking on them (please select "Save" and not "Open"), try right-clicking on the file and selecting “Save Target As.” We recommend using the WinZip Classic interface to open the zip files you downloaded. Users may also want to check with their IT department to make sure you have an up-to-date copy of WinZip. 

In what formats are the data available (e.g., SAS, SPSS, Stata)?

The data are available in SAS, SPSS, and Stata (for Windows) format. If users need data in other formats, we suggest using a file transfer program such as StatTransfer or DBMS/Copy. R users may use the .dta (Stata) files as well.

I get an error when I try open SAS files in Windows

Please use the SAS code included in zip files to read the formats. The formats are permanently attached to the variables in each data set. Or, users can select the NOFMTERR option when reading in data.

How do I merge the public data files?

Data files can be merged using the idnum variable. 

Weights Questions

How are the weights constructed?

The weights were constructed to adjust for sample design (probability of selection), non-response at baseline, and attrition on observed characteristics over the waves. For a brief introduction to using the weights, please read Fragile Families & Child Wellbeing Study: A Brief Guide to Using the Weights for Waves 1-6. For a detailed account of how the weights were constructed, please read the Constructing the Weights documents available on our Public Data Documentation page.

Why do the national sample flags and city sample flags have different sample sizes than the weights variables?

There are valid weights for 1) interviewed cases and 2) cases in which we determined that the parent or child had died, that the child had been adopted, or is living with neither parent. The cases for adoptions/living with neither parent have little or no interview data, they are coded as no in the national sample flags (and interview flags). Data users can, however, estimate the proportion of children/parents who died, etc. by applying the weights to the interview sample flags.

Metadata Explorer Questions

How do I download the metadata to work with it directly?

You can download the metadata as a CSV by going to the Metadata Explorer and clicking “Download full metadata” in the menu bar.

Can I use the metadata in my R or Python code?

R and Python packages are available. You can access them by going to the Metadata Explorer homepage and scrolling down to “Other API resources to help you use the metadata.”

Can I search the variables using my own keywords?

Using the Advanced Variable Search option, you can type your own search terms into the “Search results” box. This type of search can be done on its own or in combination with other parameters that you specify in the advanced search query builder. If you choose this second option, first specify your query rules and click “search”, and then enter your text search in the box below.

How can I tell if a question was asked in a given wave or was answered by a different respondent?

You can look up questions by topic, text search, or variable name on the Metadata Explorer. When you click on a specific variable in the Metadata Explorer, a list of similar variables in other waves and surveys will be provided. Questionnaires are also available by wave on the Public Data Documentation page.

Can I get a list of all the variables used in a particular scale (ex. CBCL, Conflict Tactics)?

You can filter variables by scale in the Browse or Advanced Variable Search features of the Metadata Explorer. Please also visit the Scales and Concepts Documentation page to view specific documentation for that scale and wave within the User Guide including source information, variable list, modifications, and scoring (if applicable). 

Publishing Your FFCWS Research

How should I cite the FFCWS?

We request that users cite the substantial funding from the Eunice Kennedy Shriver National Institute of Child Health & Human Development in their publications with the following statement: “Research reported in this publication was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) of the National Institutes of Health under award numbers R01HD036916, R01HD039135, and R01HD040421, as well as a consortium of private foundations. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.”

Additionally, any reference to FFCWS data should kindly observe our name change from The Fragile Families and Child Wellbeing Study to The Future of Families and Child Wellbeing Study.

How do I add my working paper to the FF Publications archive?

Send an email to [email protected] with the authors and title of your working paper. Please attach a word or PDF document including either the full text of your paper or an abstract, whichever you would like to be posted.

How do I add my publication to the FF Publications archive?

Send an email to [email protected] with the title and a link to your publication online.

Do I need to submit my publication to PubMed Central?

FFCWS data users are encouraged, but not required, to submit their publications to PubMed Central. If you have attended one of our Columbia University Summer Data Workshops, you are required to submit your publications to PubMed Central. Attendees of the 2012 Workshop should cite R25HD072818. Attendees of the 2013-2018 Workshops should cite R25HD074544.