ISSN : 2466-2542
The purpose of this study is to compare and examine, across multiple dimensions, the conditions under which large language model (LLM)-based content analysis can be applied according to task type in the context of research methods analysis in library and information science. To this end, 100 survey and interview studies published between 2020 and 2024 in four major Korean journals in library and information science were selected using stratified random sampling. The coding results produced by one human coder and four large language models (Claude-3.5-Haiku, GPT-4o-Mini, Gemini-2.0-Flash, and Grok-4-Latest) were compared across twelve dimensions constituting sampling methodology. The results show that relatively high levels of agreement were observed in dimensions where classification could be made based on explicit criteria, whereas consistently lower levels of agreement appeared in dimensions requiring inferential or evaluative judgment. These findings suggest that the performance of LLM-based automated coding is influenced more by the decision structure of the task and the explicitness of the available information than by model performance itself. Therefore, the scope of LLM application should be more carefully examined from the perspectives of task type and judgment characteristics, and the systematic design of human-AI hybrid validation strategies is required.