Abstract
Artificial Intelligence (AI) applications critically depend on data. Poorquality data produces inaccurate and ineffective AI models that may lead toincorrect or unsafe use. Evaluation of data readiness is a crucial step inimproving the quality and appropriateness of data usage for AI. R&D effortshave been spent on improving data quality. However, standardized metrics forevaluating data readiness for use in AI training are still evolving. In thisstudy, we perform a comprehensive survey of metrics used to verify datareadiness for AI training. This survey examines more than 140 papers publishedby ACM Digital Library, IEEE Xplore, journals such as Nature, Springer, andScience Direct, and online articles published by prominent AI experts. Thissurvey aims to propose a taxonomy of data readiness for AI (DRAI) metrics forstructured and unstructured datasets. We anticipate that this taxonomy willlead to new standards for DRAI metrics that will be used for enhancing thequality, accuracy, and fairness of AI training and inference.