Tuesday, February 07, 2006

Data overloading and model development

Having worked with several large districts (and more recently several state programs), it clear that data overloading (additional non-standard field values) is a common problem. DM Review describes the problem this way:
If a database has not been defined with all knowledge workers' information requirements, and that database is not easily extendable, knowledge workers will often use an existing field for multiple purposes.
A common example is the Free/Reduced Lunch program participation field. A program administrator at the district or state level needs to report out the total number of children in each category. Legitimate values for the field may be "F" and "R". However, one office in the district is charged with adjudicating cases in which the family is close to qualifying or mistakenly enrolls when they are not eligible. They are responsible for tracking those denials and enter the letter "D" in the Free/Reduced Lunch field to allow them to run reports at the end of they year. This use of the variable was never anticipated in the new student management system and is purged when the data is loaded to the student data warehouse - erasing data vital to the group who entered the values.

Data overloading is one of several data quality issues that will have to be confronted as districts and states move from reporting annual and aggregate data to longitudinal analysis of individual-level data.

Chris

No comments: