Wednesday, May 17, 2006

Data Quality and the risks of "running with what we have"

There is a great temptation for education (and other) organizations to just "get something up" and call it a data warehouse as part of a strategy of retaining the support of senior leaders. One of the problems data warehouse designers have in organizations with little history of decision support is that the clients (program area staff) literally cannot identify needs that extend past their current experience with data. One common solution is to take the current operational data and its definitions (such as they are) and simply load them into a warehouse. One can then take existing reports as the design documents for data marts.

The good thing about this approach is that it provides a wonderful teaching environment for bringing program staff into the discussion using data and representations that they know and can make sense of. The risk is that they will see this and want to run with it. It is guaranteed that these data (and definitions) will contain serious quality problems there were not exposed or stressed under the older, more constrained reporting system. While this might seem like an early success, going forward with this system can be very risky. Program staff are experts in their programs. Data problems will emerge and they are likely to blame the system rather than the data or collection processes. They are also likely to see the problems as someone else's problem and not be receptive to requests that they "clean" the data. They may come back with requests for IT to fix the transactional system.

This is a cautionary tale for states working on getting their warehouses up as rapidly as possible.


No comments: