If you are in market research, there’s a high probability that you have grappled with data quality issues. Trying to outsmart fraud is an ever-changing game of cat and mouse that many suppliers are forced to play. And while there are some agreed-upon best practices, we have yet to see industry-wide adoption. Below I review the history of data quality measures in online sample and suggest what needs to be done moving forward.
Where We’ve Been – Reactive Data Quality Measures
Market Research had a crisis moment in the early aughts. There was tremendous risk in the data being collected, and KPIs widely fluctuated. Without device fingerprinting, there was no effective de-duplication. The need wasn’t discussed, and neither was the idea of having questions to validate attentiveness, etc. This resulted in multimillion-dollar business decisions that were made incorrectly based on bad data.
In or around 2006, Joan Lewis at Procter & Gamble stood up and questioned the viability of online sample. This moment can be seen as the genesis of the concept of online market research data quality and fraud detection. But what was born was reactive. A decentralized industry began with different players and no standard way of doing things, which in many regards, is still true today.
One solution was TrueSample; a company put together by the CEO of MarketTools, Amal Johnson, who tasked the group with figuring out a solution to data quality. Its creation was the advent of a third-party offering that would validate respondents.TrueSample initially tackled data quality from the idea of traditional identity confirmation; first name, last name, home postal address, etc., using two separate databases, one based on credit information and the other on eCommerce.
TrueSample morphed and transitioned throughout the years, changing ownership three times in a decade. I, too, grew and learned with the business. Starting as a lowly program manager, I was promoted eventually to CEO in the last four years before it sold in 2017. PureSpectrum was a customer of the TrueSample business, and I had known Michael McCrary for many years. Joining the team here was exciting because I suddenly got access to data from a quality perspective that was new to me.
A significant difference with working at PureSpectrum that I have experienced is having a direct interaction with the respondents. Many third-party tools rely purely on API calls and information shared about the user’s device and browser. Still, nothing is observed about the users’ behavior firsthand. At PureSpectrum, we have that first-party interaction with the user. We ask them questions directly, observing their behaviors (point in time and longitudinally), how they interact with our system, and the surveys themselves.
Where We Are – Accepting Imperfection in Data Quality
There’s some commonality now around how the industry approaches data quality. We’re trying to address certain concepts around data quality and fraud, but there’s no commonality in the actual execution of it. This leaves more openings to have proprietary offerings. At PureSpectrum, we’ve got standard third-party tools that we license, and then we have proprietary offerings that wrap it all together with our PureScore™ algorithms. We’ve taken a blended approach to ensure that we’ve got the best of both worlds represented in our commitment to quality.
As the next evolution of my data quality journey, I am excited to interact directly with the respondent and understand the device-level characteristics of the computer, mobile phone, or tablet a respondent might be using. How do you connect those attributes to the actual behavioral characteristics, and how do you use each to better discriminate signal from noise? We’re just assessing the probability of good versus bad, risk versus less risk, and the potential for a lot of noise. The noise is Type 1 and Type 2 errors in a more traditional market research data articulation.
Type 1 and 2 are false positives and negatives, and you want to be wrong as little as possible. Recognizing that you’re always going to be wrong sometimes is a critical component of how data quality and fraud detection is handled within market research. It requires a shared understanding of being wrong on the false positive and negative sides of things some percentage of the time. And then the challenge is just to be wrong as infrequently as possible.
Suppose you think perfection is attainable or required. In that case, it’s a challenging interaction from a data quality perspective and a dangerous one because potentially unrealistic expectations are happening. The desire for perfection can signal a lack of understanding of the workflow of a respondent moving through the industry. From when they are offered a survey, to where they are pre-qualified for that survey, to where they take the survey, respondents do different things at each step, and each step dictates the type of data quality checks that can happen. Everybody in the ecosystem along the supply chain has responsibility for data quality. At PureSpectrum, we often think of these as concentric circles.
Buyers and Suppliers don’t traditionally connect, so somebody in the middle needs to be the common point of collaboration around data quality. That is something we embrace within PureSpectrum and with PureScore™. We are the ones that have to coordinate on behalf of publishers, suppliers, and buyers. We have a big job in the middle, so we take quality seriously.
PureScore™ is a dynamically evolving organism. In the past three years, it has undergone drastic innovations. This is a result of our QEP processes and constant interviews with our buyers, suppliers, field leadership, and account managers. We take feedback and proactively adjust algorithms and add additional quality measures.
A great example of this is our DQ4All, which is our trigger-based respondent data quality screener. Created to assess new respondents to our platform, the data quality screener morphed into an ongoing respondent-level quality check. Based on triggers such as new to our platform, frequency of interaction, specific surveys, etc., the respondents are exposed to DQ4All and evaluated for performance on the screener. This screener records constant feedback and offers a rehabilitation opportunity for low PureScore™ respondents.
The Future of Data Quality – Never Done
Data quality is like solving a math equation that gets harder while you’re solving it, and it’s a moving target whose complexity is continually rising. That’s the challenge of data quality and fraud detection because fraudsters become increasingly, if not exponentially more sophisticated, every month, every quarter, and every year.
As the ecosystem evolves its techniques for detecting and blocking fraudsters, they, in turn, grow and increase their techniques in circumventing preventative measures. Constant vigilance and rapid innovation are paramount to staying ahead of that game. You’re trying to predict and react all at the same time.
But then there is non-malicious data quality, which is real people who, for whatever reason, have a bad day or become fatigued by the survey design experience. This behavior is something the industry has battled for decades; at the end of the day, it’s really about how we offer consumers an enjoyable, compelling, and engaging experience with market research. At PureSpectrum, we believe this comes in the form of shorter surveys, fewer words in the question, and more mobile optimizations. None of these ideas are new, yet we haven’t seen industry-wide adoption.
What will it take to finally make some of these changes as an industry? When is enough truly enough, and we finally make the required changes? What are we doing to help make that world become a reality? Of course, I can only speak for PureSpectrum, but we’ve got big things coming in 2023, and we can’t wait to share them with you.
Ready to experience PureSpectrum’s data quality first hand? Reach out to our team below: