We didn't set out to create a data quality product.

A project at one of our steady clients required migrating data from the old Vantive CRM system to the new Siebel CRM system, cleansing the data in the process. We often recommend "buy" over "build" (even when we're likely to be the builders), and were part of the client's team that evaluated the high-end commercial products. Our conclusion: the cleansing engines were pretty good, reflecting many years of experience. But the profiling and review tools were, to put it mildly, not very impressive. In fact, even the cleansing engines had some surprising shortcomings.

In our view, data quality is a process of iterative refinement. For many data problems, there is no single right answer. Therefore the actual cleansing is less important than complete visibility into the cleansing process. That's what we provide with DQ Now.

It turns out that much of our past experience was a perfect fit. For at least a dozen years each, we've been crunching text in various ways. Beyond just parsing addresses and removing duplicates, we've written compilers for "little languages", converted between various file formats, extracted content from Web pages, and more. We've even learned a few lessons from image processing, some of which will appear more directly in future releases.

The path that led here is a familiar one. DQ Now's parent company, PreFab Software, started the same way: in the course of a development contract (workflow automation for professional publishing), we identified a major gap in the market and built a product to fill the need. PreFab has several thousand happy customers around the world, with "runtime" software deployed to well over ten thousand individual users.

We look forward to bringing the same level of customer satisfaction to the data quality market.

Next: review a few resources we assembled.