ISSN : 2287-9099
While open government data (OGD) is increasingly recognized as a critical resource for economic growth and data-driven innovation, methods for proactively evaluating the potential utilization of these datasets remain underdeveloped. This study addresses this gap by investigating two key methodological questions: first, whether automated machine learning (AutoML) is an appropriate tool for measuring and evaluating OGD utilization, and second, how the composition of training data affects the performance of models designed to predict such utilization. This research specifically compares the efficacy of two distinct data strategies: models trained on integrated datasets spanning multiple domains versus those trained on domain-specific datasets. Using metadata from the South Korean government’s extensive OGD portal, this study employs AutoML to systematically build and evaluate predictive models under these different training conditions. The findings reveal that the training data strategy is a critical determinant of predictive accuracy, with the integrated-domain approach frequently yielding superior performance over domain-specific models. This research provides empirical evidence on the impact of data integration strategies in this context and establishes a methodological framework for the prospective assessment of OGD value, offering a more robust alternative to traditional retrospective evaluation metrics.