In the article Metadata Normalization As an Indicator of Quality?, the author Mark E. Phillips analyzes the Digital Public Library of America’s metadata subject fields in order to provide insights into improving metadata records at their source. In particular, he is using the subject fields as an indicator of of the quality of a set of records via the use of OpenRefine, an online, open-source, data cleaning tool, formerly supported by Google.
Aside from his arguments regarding normalization, the article itself is a lesson in the dangers of relying too heavily upon one platform or software resource. In this case, the danger lies in a system becoming obsolete or simply disappearing all together. In the case of OpenRefine, Google support ended in 2012 and it now relies upon volunteers to manage and refine the current system. In addition, the website News page hasn’t been updated since Spring of 2016. Combined with the fact that the system relies heavily upon Java, this does not bode well for the longevity of the system as a useful tool in the future. (Security issues have increasingly plagued Java and many developers, especially in areas of database management, are moving away from the program.)
But back to his core argument – normalization as an indicator of quality. As someone who works with databases daily, I can say that yes, normalization is extremely important in being able to correctly access the information you require, and as such could be seen as a measure of quality – insofar as it is a measure of the amount of time and money spent on a system. But people are error-prone and by default, so are our systems.
Algorithms cannot catch every mistake, nor should we solely rely upon them to do so. In allowing ourselves to become dependent upon strict normalizations we lose ambiance, character, individualization, as well as contextual clues that are not so easily categorized and placed into tidy, neat, little boxes of definition. Normalization can facilitate discovery, but it should never be used as a crutch at the expense of human-cross-referencing skills.
Normalization can also be used to hide or misdirect information as well. Therefore, while a system could be structurally “perfect” the level of real usability is still dependent upon the quality of information contained within and less upon the framework in which it resides. We can be mislead by the metadata contained within data repositories – therefore, quality should be judged more by the usefulness of a system rather than the particular web in which it resides.