May 3 ’12

z/Data Perspectives: Can Data Quality Be Forced?

by Craig S. Mullins in z/Journal

How many times have you surfed the Web only to encounter a form that requests a slew of personal information before you can get the information you need? You know what I’m talking about. A company markets a white paper or poll results or something else that intrigues you, so you click on the link, and bang, there you are. You don’t have the information you wanted yet, but if you just fill out this form, then you will be redirected to the information.

Makes you want to scream, doesn’t it? Some folks just close their browser or move on to something else. Some folks enter partially accurate information to see how little they need to provide without getting rejected. And some folks just provide bogus information.

Now sometimes completely bogus information won’t work. Maybe the form requires an email address to which the information will be sent. But hey, that’s what Gmail and Yahoo! Mail were made for, right? Just create a new address, fill in the form using it, collect the information, then shut down, or ignore that email account for the rest of your life.

Of course, if you’re trying to sign up for a webinar, this might not work because many companies remove generic email addresses. Who can blame them? They’re conducting a webinar to drum up business and gather leads. If you provide a generic email address, the organizer will assume that you aren’t a good lead or maybe even a competitor trying to gather intelligence.

Then there’s the phone number. I almost never supply an accurate phone number. If the form allows, I type in “do not call me” as my phone number. If not, then I use the information number, 555-1212 (with my area code). I get more than enough cold calls for things I don’t need already, thank you.

The point I’m trying to make is that these marketing tactics are responsible for the creation of a lot of bad quality data. But at least some of the data must be useful or the marketers wouldn’t use these tactics. And who can fault marketers for actually trying to target prospective customers? After all, that’s their job. And the information was evidently interesting enough to get you to click to it, right?

So what’s my point? Well, I have a couple of them. The first is that these Web forms need to be more stringently developed. For example, you should never be able to type characters into a phone number field. I’m talking about basic edit checks that every programmer should have been taught to do in Coding 101.

You also can check for and reject commonly submitted bogus items. For example, Mickey Mouse will never be your customer. And an address of 1313 Mockingbird Lane may be good for the Munsters, but not your customers. And while you’re at it, any phone number with a 555 prefix can be summarily rejected, too.

If you’re really interested in accurate data, take the time to do some more robust edit checking. Do the area code and ZIP code entered actually exist? Do they match the city and state that was entered? For example, if someone enters the 512 area code (Austin, TX) but enters Pittsburgh, PA for the city and state, you know the data is bogus. Or at least suspect … after all, people do move and take their mobile phone number with them. I have a friend who moved from Chicago to Florida to New York to Texas and he still has a mobile phone with Chicago’s 630 area code.

And if you want to go even further, you can match up company names to known addresses for that company to verify that an actual, accurate company name is being provided. Of course, there are exceptions here, too. Maybe you work from a home office and you’ve provided a legitimate address.

The bottom line is that organizations can do better at verifying data in their customer-facing Web applications. But even then, you just can’t force data quality. There will still be people “out there” (like me) who find ways to enter good enough data to prevent someone emailing them or calling them, trying to sell them something all the time. And the data quality fight continues …