Representativeness of Findings

How representative are the IDEAL DEI Survey findings?

We are very thankful for the sustained community interest in the 2021 IDEAL DEI Survey and gratified that people are engaged in the findings and want to know more about the survey and survey methods. We have heard a number of questions from the community concerning the response rate to the survey and the representativeness of the survey findings that we address below.

Census vs sample surveys and statistical significance

The survey was designed as a census survey and sent to all faculty, staff, students, and postdocs employed by or enrolled in Stanford University in May of 2021. In other words, the survey was sent to the entire Stanford population (like the federal government does with its decennial census) and not a random sample (as is often done with opinion polling) or subset of the Stanford population (a strategy often employed by large scale survey efforts, but not in this case). The overall response rate to the survey was 36% of the entire Stanford population, not 36% of a random sample. This means that, compared to many surveys, the number of respondents to the IDEAL survey was high, almost 15,000 people, and findings from the survey should be considered to reliably represent the experience of many people at Stanford. It also means that, in contrast to most surveys done at Stanford, the IDEAL survey findings comparing subgroups of respondents are comparisons between relatively large numbers of people, even for populations that are traditionally underrepresented in smaller-scale surveys (e.g. American Indian/Alaska Native - over 200 respondents, or trans identifying - over 100).

Statistical significance tests are often used in presenting the findings of survey research in order to help audiences understand the degree of confidence they should have that the random sample of a population that took a survey is representative of the full population. Put differently, these tests are often used to express the probability that the findings from a survey could be due to random chance and not representative of the real opinions and experiences of a broader population.

In the findings presented we have not shown the results of statistical significance tests when comparing respondent groups (e.g. by race/ethnicity) for two main reasons: 1) inferential statistics are most often considered appropriate for generalizing from a random sample to a larger population, and the IDEAL survey was a census survey of a full population and did not employ a random sample, and 2) because there were so many respondents to the IDEAL survey that most traditional significance tests would show statistically significant differences even for small differences between groups - in other words, nearly all group differences would be statistically significant simply because these tests are highly sensitive to the size of the survey sample.

The principles of statistical inference suggest that the larger the sample of a population, the lower the probability that you could find a different group of respondents with systematically different enough opinions and experiences that it would lead you to drastically different conclusions from the data collected. Ultimately, given the large sample size of the IDEAL survey, it is likely that the survey findings are highly reliable in the sense that, from a statistical perspective, if we were to have conducted the survey again during the same timeframe and context, there is high likelihood that it would have produced very similar findings.

However, it is important to note that a high likelihood of reliability, or consistency of findings, does not necessarily imply a high likelihood of validity, or the generalizability of the findings from the 36% of the population that took the survey to the rest of the Stanford community who did not. For example, there may have been sources of response bias that could make it likely that those who did not take the survey were systematically different from those who did take the survey (e.g. the people who responded to the survey could have had a greater interest in diversity, equity and inclusion issues than the people that did not participate, or the people who did not participate opted out because they were more likely to be worried about retaliation for their survey responses by colleagues or university officials than those that participated). In other words, although large numbers of respondents suggest a high likelihood of reliability, if 100% of the university population had taken the survey there is the possibility that some results would differ.

Response bias

There were several sources of known response bias when comparing the population who took the IDEAL survey to the population who were invited but did not participate. While response rates were very similar across most race/ethnicities (see chart below), response rates did differ by role at the university (e.g. staff vs students) as well as by biological sex.

IDEAL DEI Survey Response Rates by Federal Race/Ethnicity Category

Bar chart of survey response rates, by Federal race/ethnicity

For example, faculty and staff were more likely to respond to the survey, and females were more likely to respond than males. Despite these differences, it is important to note that, in aggregate, overall demographics like biological sex and the race/ethnicity of survey respondents were very similar to the overall demographics of the full Stanford population (e.g. as represented in the IDEAL dashboards).

For demographics like university role and sex, where data is available in university systems for most members of the Stanford population, we can use statistical techniques (e.g. survey weighting models) to account for response rate differences and “correct” for them (see below for more information). After analyzing the results of a variety of weighting models, we found that these known sources of response bias did not appear to significantly skew the survey findings. For example, if we weight the survey responses of the men who took the survey to make their representation in the survey data equal to their representation in the overall Stanford population, the response distributions from most survey questions presented in our reporting shift less than two percentage points (see below for examples). This is another indication that the survey findings are highly reliable in their representation of the experiences of Stanford community members that were most likely to participate in the survey.

However, as stated above, we cannot accurately assess the precise degree to which the findings from the 15,000 people who took the survey can be generalized to the rest of the Stanford population who did not participate. This is because: 1) we do not have the population data for all of the demographic and identity variables that we would need to evaluate all potential demographic drivers of potential response bias (e.g. while we have biological sex at the population level in university records, we do not have gender identitiy and cannot know whether or not the number of nonbinary or trans-identifying community members that responded to the survey was proportional to the number in the overall population), and 2) there may be unexplained, unknown forms of response bias that make nonresponders systematically different from responders - for example, systematic lack of experience with or interest in DEI-related issues, or worries about privacy of responses and/or retaliation among certain groups.

Substantive significance of the survey findings

While it is important to acknowledge these survey limitations, it is also critical to note that, while we may not know precisely how well the survey findings apply to the entire Stanford population, the findings do represent the lived experiences of nearly 15,000 Stanford community members. Survey respondents include thousands of Stanford faculty, staff, students, and postdocs who have had extraordinarily negative and harmful experiences at the University. To be more specific, the findings include thousands of incidents of verbal or online harassment, thousands of incidences of discriminatory behavior, and over 500 incidents of physical harassment, including nonconsensual sexual contact. In addition, Stanford community members who had these experiences were represented in nearly every university department, major, and administrative unit. However representative the experiences of the survey respondents are of the prevalence of these experiences among the Stanford population that did not participate in the survey, it is important for all of us to acknowledge that the IDEAL DEI survey findings describe the lived experiences of thousands of Stanford community members.

More information about response weighting

The IDEAL DEI Survey team has employed a variety of approaches to weighting the survey data. The results of these analyses have not suggested that correcting for response bias based on known population demographics would result in large differences in the overall findings.

For example, when we use raking models - to simultaneously adjust for response bias across female/male, role (student, staff, etc), and race - and look at the broadest question on the survey across the full population, we end up with the following:

weighted vs unweighted results in bar chart

The weighted data, represented in the second chart, show almost no difference at all when compared to the unweighted data. This is in part because response rates by race were very similar, and women and men - where we see the biggest response rate difference - responded very similarly to this question. (The chart below shows unweighted responses to the same question by gender identity).

I feel valued as an individual at Stanford, by gender chart

However, we can only weight the data based on biological sex, since that is the only data we have for the full population (in university records), and we see the biggest differences are among non-binary groups, for whom we have no way of knowing whether or not there are systematic response biases since this is the first time we have ever collected gender identity data from the full university population (for undergraduates, where we do have systematic data, there doesn’t appear to be much bias at all).