Behind-the-scenes of the CEP survey

Paulina Valenzuela is Managing Partner at Datavoz, one of the oldest market research and public opinion firms in Chile. The current vice-president of the Asociación de Investigadores de Mercado y Opinión Pública de Chile (AIM) and a woman known for her extreme reluctance to give interviews, talked with our own external consultant Javier Sajuria, who in addition to being one of the hosts of Bicameral — Azerta’s podcast — is currently an academic at Queen Mary University of London where he works on issues related to surveys and polling.

In the interview (available as the last chapter of the podcast from which excerpts are taken) Valenzuela, Master’s degree in statistics from the Pontificia Universidad Católica (PUC), tells us about her role in the CEP survey. And how from that perch she and her team go about the work of ‘translating’ the most pertinent data from the survey that charts the political rhythms in Chile.

What role has Datavoz had over the years in the CEP survey?

We have had a very long-running participation with the survey, starting in ’94 when Adimark left this alliance. Then we were out of it for a time, and some years ago we returned.
The CEP is a probabilistic survey across all stages, and our role is to advise and supervise. We also have a role in controlling the design on one hand and quality control on the other, as well as supervising the work that must be carried out to meet the survey’s technical requirements.

So, to understand the process a little bit, the Center for Public Studies (CEP) designs a questionnaire; a series of questions that they want to ask in each one of their surveys, and they say to you, ‘Datavoz, we want you to create a sample and tell us where we need to go ask our questions.’ Then there is another company that provides the interviewers who go to households to collect the data from that questionnaire. The data is then returned to Datavoz, which reviews it and checks their weightings, in other words the results are adjusted or weighted based on the Census. Finally the data is returned to the CEP. Is that it, or am I skipping an important step?

No, you’re not skipping any step. This is indeed the structure of the work entailed in the CEP survey. These days it’s a little different because it’s a telephone survey, so it works differently.

Great. In discussing a survey one says that it is representative of the population, in this case the population of people over 18 living in Chile, and the survey should represent that segment, but for some reason sometimes that data is not available or parts are over or under represented. This is the case with people living in extreme areas of the country, which the pollsters are probably never going to reach….

In the case of in-person interviews the sampling frame normally used for surveys of this type, which is absolutely probabilistic and in-person, is the one connected to the 2017 Census. Everything that has to do with census mapping, in other words blocks and streets, basically that allows you to find your way to a dwelling place.
It is good to remember that the National Institute of Statistics (INE) cannot publicly provide information at the household level, it only provides information at the block level, which covers sets of households. This is the data used for the selection process.
However, you need to be clear that the most powerful sampling frame to conduct surveys in Chile is at the household level, and even this has certain problems due to the fact that there are areas or sectors that are left out, for reasons of accessibility or difficulty.
When the sampling frame does not match up 100% with the target population, that is called population coverage, and it’s a key element of the sampling designs. What is desirable is that the data from which the sample is taken have the widest possible coverage to prevent potential biases or representation problems in the survey itself.
What is expected is that the make-up of the sample, in proportional terms relative to some parameters of interest such as sex, age, educational level, geographic area, will be similar to what is reported by the official information, i.e. the Census.

Telephone sample

To anticipate response rates — because there is an important issue related to selection bias — the last CEP survey used Random Digital Links, which is a method for randomly calling a universe of telephone numbers. How does it work in practice?

In Chile we do not have databases of available numbers. Unlike before with the White Pages or the Yellow Pages, where you could have access to landline information that you could use to generate samples, no one now can go and pull those kinds of numbers from somewhere. That is just not the case anymore.
What we have estimated, based on the 2017 Casen report, is that approximately 96% of the people report having access to a landline or cell phone. The wireline network has been shrinking and is probably around 40% at the household level, but cellular telephony is skyrocketing and more than 95% of the people have access to a cell phone.
Currently, the only accessible public data are telephone company codes, because you know each company’s first series of numbers, but you don’t know the last ones. So if there are nine numbers, you don’t have a clue about the last six.
The exercise is to simulate a number and then validate it using some kind of automated process. These numbers then go into a machine that dials the numbers for the interviewers to administer the questionnaire.
So what I’m saying is that to reach a sample size of 1,200 to 1,500 numbers you need to have approximately 15,000 to 20,000 pre-validated numbers. Still the system is not perfect, and it will still give you numbers that do not exist.
It is a simple random sample and there is a significant volume of numbers that are not eligible. But for those that are eligible, the CEP response rate is 12.7%.

But if there are more cell phones than people in Chile, one might think that the sample frame does not match. How do they make that adjustment when making the call, compared to what a sample frame constructed from the Census would be?

The only thing we know about our sample universe is that the numbers we generate meet two conditions: they are either in that universe or they are not. And what we are interested in is validating that the number does belong to the universe.
Which is why when reporting response rates in theory we not only take into account the numbers that do respond, but we also include categories like “picked up the phone, “said hello,” refused to answer”, or said “I can’t.”
This calculation takes into account the response rates from those numbers that do not answer but do exist, as well as the ones that send you to voicemail. It’s a matter of considering all possible categories.

Why is the question of response rate important?

I think it is extremely appropriate for this indicator to be included in the conversation because the information behind it is extremely key. I’ll give you examples of what the response rates are telling us. Probabilistic face-to-face surveys, the purest kind so to speak, in Chile today have response rates on the order of 50% to 60%.
However, there are sectors of the population that cannot be reached in person, and this has been getting to be more and more the case, for example, the classic condominiums in Chicureo, Colina, Calera de Tango. It is also very difficult to reach young people who live alone in an apartment in downtown Santiago.
What happens is that when you start to look at the response rates, it tells you not only about people’s interest in answering surveys, but also about the possibility of reaching those people.
And here we run across a very curious analysis, which is that with telephone surveys we do reach these populations. And those types of people do end up answering the survey because we don’t have to request access to them through whoever is at the door of the condo, rather we connect directly with them by phone. The same is true for the young people living alone.
So, even though in terms of response rate your survey’s reach is lower, it seems you are getting a better representation of the general population because you are reaching those that you couldn’t reach with a face-to-face survey.
But the face-to-face surveys do have better response rates. And thus one is left to ask, ‘so, what is important then?’ The response rate gives us information and we can see the level of people’s reluctance or interest in answering those questions. It gives you the kind of data that, when analyzed, especially by researchers or political analysts who are constantly trying to predict behaviors, helps them understand ‘what’s behind it.‘
So, while it is true that 12% of those contacted by the CEP survey did not respond, it is also true that there is a very high percentage of people we could not reach because they didn’t answer the phone. And with that in mind, one can begin to build hypotheses. The ones who don’t answer the phone: “Who are they?” The ones who can’t be reached: “Who are they?” Of course, in the face-to-face setting, one can identify them more clearly, which is more difficult to do by telephone.
Another key element is that response rates are associated with the type of survey. This item, related to the use of telephone, face-to-face and online survey modalities, is one for which there is still little Chilean experience and data.
Now, with regard to more qualitative kinds of elements because people say that they are going to vote, we have to take their word, ‘but maybe today he says yes but in five days he changes his mind‘, or ‘he said yes because he did not dare to tell me otherwise.‘ It is more difficult to correct surveys for these types of variables, unless you have time series which allow you to keep looking and assessing.
Which is why I think it is so significant and key that the technical data sheets of the surveys published in Chile include this data, because it gives you very good information about who the respondents are.

Distrust and bias

The problem is when there is a systematic bias or there is a group of people being excluded from the survey. I do a lot of work with elite polling, and one of the things we do is polling of candidates running in different elections in the UK, but there is one segment that is becoming increasingly difficult to reach, which is the Conservative Party candidates. For various reasons they distrust academic research, they don’t like to be investigated, and there is a kind of tacit agreement not to answer these types of surveys. This has led to an increasing non-response rate, though we identify them somehow and can predict certain behaviors. In the case of opinion polls in Chile, you do not know how these people think and you have no way of anticipating whether they are more to the left or to the right. There is a leap, I don’t want to say a leap of faith, but there is a point when one has to say, this is what I can gauge and this is as far as I can go…

That’s the big issue. I was asked if I could make a projection of the turnout in the elections over the weekend. No projections, but I can give you different scenarios. And when you analyze all the possible scenarios, you end up saying, but I’m not concluding anything from this, that’s the problem.
In the example that you’re giving me, what is going to happen is that the sample sizes are going to get smaller and smaller, and so when a person does respond it is likely that someone else will say: ‘is the person responding really a faithful representative of the group I’m trying to poll?’
In Chile, a similar thing happens with polls, particularly with electoral polls. People respond because they are interested in communicating their opinion. And, indeed, to assume that those who do not express an opinion think the same as those who are expressing an opinion is questionable, but the whole analysis has to be based on certain assumptions.
What matters in the end are the trends. Rather than getting dogmatic about the method being used, what I am interested in looking at are the survey trends over time to see how certain indicators are performing. That means that one can probabilistically say that: “X thing happened here.”
I don’t know if you remember when there was talk about the hidden vote on the right? These events do exist. I was talking to a researcher in Peru, and she told me: ‘Look, we talk to people, and they don’t answer us, they don’t tell us that they are going to vote for Keiko, because it is not well regarded in Peru today to say that one will vote for Keiko. But maybe this hidden vote will totally change the predictions that we are making day by day and that we will have out a few days before the election.’
I agree that it is very difficult, which is why it’s important to take all possible methodological and design precautions, at least to be as close as possible to estimating what the survey is telling you.
I was recently on a panel with international researchers about how to detect these problems, in other words, how do we research those who don’t respond as they should and how they do respond instead. It is a very difficult question to answer because it is indeed complicated to approach those who say they do not want to answer surveys.
There are times when people express their opinion because they feel more confident, but there are other times when they keep it to themselves because they prefer not to be judged.
This also is very strongly linked with the survey modality. The same sensitive question asked in person by a face-to-face interviewer, where you see the other person talking to you, versus a self-administered online survey, can yield very different answers.
In fact, we did this exercise with the CEP in 2018, and the interesting thing about the experiment was that neither the subjects nor the samples were different, instead we asked the same subjects the same questions in person and then we asked them telephonically, or online, and with certain groups of questions people changed their opinion.

So it is not enough just to look at the response rates or the survey period, rather you also have to pay close attention to the modality, whether it was online, face-to-face or by telephone. That also makes a difference.

@Columns

Yasna and the Void

28 May 2021

Yasna and the Void

What the president of the Senate is doing may not lead to her meteoric rise, but if well executed and communicated it will undoubtedly show a certain way forward at a time when the background chorus is singing of desolation and uncertainty. By Camilo Feres (*)

@News

Are offices in retreat?

28 May 2021

Are offices in retreat?

The pandemic changed our relationship with work with new work models being implemented in companies all across the globe, and it appears that there is no turning back. Among these models is one that only a decade ago would have been unthinkable: Flexible Paid Time Off. Also known as Flex PTO, it offers employees unlimited paid time-off during the year. This benefit, offered by companies such as Netflix, General Electric and Virgin, seeks to engender teamwork, goal orientation, and engagement in work environments. This model promises to change workplace dynamics and is already being used by VTR in Chile.

Premios