Applicability Evaluation and Feature Optimization of the OpenAlex Global Author Disambiguation Model for Korean Scholarly Data

Jeong Hyeong-Sang; 정형상; Kwak Seung-Jin; 곽승진

doi:10.14699//kbiblia.2026.37.1.387

오늘 하루 그만보기

P-ISSN1229-2435
E-ISSN2799-4767
KCI

홈으로

OA 정책

ISSN : 1229-2435

논문 상세

이전 다음

논문 투고

Vol.37 No.1

PDF Citation

OpenAlex 글로벌 저자 식별 모델의 한국 학술 데이터 적용성 평가 및 특성 최적화 연구

Applicability Evaluation and Feature Optimization of the OpenAlex Global Author Disambiguation Model for Korean Scholarly Data

한국비블리아학회지 / Journal Of Korean Biblia Society for Library and Information Science, (P)1229-2435; (E)2799-4767

2026, v.37 no.1, pp.387-410

https://doi.org/10.14699//kbiblia.2026.37.1.387

정형상(Hyeong-Sang Jeong) (충남대학교 문헌정보학과)
곽승진(Seung-Jin Kwak) (충남대학교)

정형상, & 곽승진. (2026). OpenAlex 글로벌 저자 식별 모델의 한국 학술 데이터 적용성 평가 및 특성 최적화 연구. , 37(1), 387-410, https://doi.org/10.14699//kbiblia.2026.37.1.387

복사

초록

저자명 식별은 학술 정보 시스템의 핵심 과제이나, 영문 중심인 OpenAlex 모델의 국내 학술 생태계 적용성에 대한 검증은 미비하다. 본 연구는 KISTI OCEAN 데이터베이스의 2023∼2024년 논문 54,049건을 활용해 OpenAlex 모델의 한국 데이터 적용성을 평가하고, 한국어 특성에 맞춘 7개 특성 최적화를 수행하였다. 단계적 실험 결과, F1 점수는 0.852(v1-1)에서 0.860(v2-2)으로 향상되었으며, 정답셋 보정 후에는 정확도 0.930, F1 점수 0.931을 달성하였다. 또한 ORCID 기반 교차 검증에서 F1 점수 0.892를 기록하여 모델의 신뢰성을 확인하였다. 특히 대규모 데이터의 효율적 관리를 위해 증분적 처리 방식을 도입하고 수작업 검증을 결합한 최적화 공정을 제안하였으며, 최종적으로 국내 저자 183,105명을 109,205개 식별자로 그룹화하는 파이프라인을 구축하여 실무적 타당성을 검증하였다.

keywords: 저자명 식별, OpenAlex, 한국어 저자명, 특성 최적화, OCEAN, 학술 데이터베이스

Abstract

Author Name Disambiguation(AND) is a critical task in scholarly information systems; however, the applicability of the English-centric OpenAlex model to the Korean academic ecosystem has yet to be fully validated. This study evaluates OpenAlex’s performance using 54,049 papers (2023-2024) from KISTI’s OCEAN database and optimizes seven features tailored to Korean linguistic characteristics. Stepwise experiments demonstrate that the F1-score improved from 0.852 (v1-1) to 0.860 (v2-2), ultimately achieving an accuracy of 0.930 and an F1-score of 0.931 after ground-truth refinement. Cross-validation with ORCID yielded an F1-score of 0.892, confirming the model’s reliability. Specifically, we propose an optimization process that combines incremental processing with manual verification to manage large-scale data efficiently. Finally, the study validates a pipeline that successfully clusters 183,105 author records into 109,205 unique identifiers, verifying its practical feasibility and scalability for Korean scholarly metadata.

keywords: Author Name Disambiguation, OpenAlex, Korean Author Names, Feature Optimization, OCEAN, Scholarly Database

바로가기메뉴

논문 상세

Vol.37 No.1

OpenAlex 글로벌 저자 식별 모델의 한국 학술 데이터 적용성 평가 및 특성 최적화 연구

Applicability Evaluation and Feature Optimization of the OpenAlex Global Author Disambiguation Model for Korean Scholarly Data

초록

Abstract

한국비블리아학회지