바로가기메뉴

본문 바로가기 주메뉴 바로가기
 
 

logo

  • P-ISSN1738-6764
  • E-ISSN2093-7504
  • KCI

SiCExtractor: A Retrieval-Augmented Generation Pipeline for PVT-Based SiC Crystal Growth Research

INTERNATIONAL JOURNAL OF CONTENTS / INTERNATIONAL JOURNAL OF CONTENTS, (P)1738-6764; (E)2093-7504
2025, v.21 no.4, pp.107-115
Fatemeh Pishkool (충북대학교)
오효석 (충북대학교)
류가애 (한국세라믹기술원(KICET))
박진화 (한국세라믹기술원(KICET))
류관희 (충북대학교)

Abstract

In this study, we present a Retrieval-Augmented Generation (RAG)-based pipeline designed to extract key values from scientific literature on Silicon Carbide (SiC) crystal growth using the Physical Vapor Transport (PVT) method. To improve the relevance and completeness of the retrieved context, we implemented a hybrid retrieval strategy that combines dense retrieval via FAISS with sparse retrieval using BM25. We employed two distinct prompting approaches for key value extraction. The first approach addresses interactive user queries by utilizing the retrieved context to generate informed responses. The second approach, intended for bulk extraction, follows a two-step process: a binary classification prompt first checks for the presence of relevant information related to a query. If relevant information is confirmed, a subsequent prompt extracts the value under strict constraints—requiring exact phrasing without guessing or explanation. This binary pre-check significantly enhances the identification of true negative cases, thereby reducing irrelevant or missing data. For the generative component of our pipeline, we evaluated three large language models (LLMs): Llama 8B, Gemma 7B, and Mistral 7B, all operating on a local multi-GPU environment using FP16 precision. The results reveal differences in the efficiency of these models within our customized RAG system, particularly in their performance in extracting over 156 targeted technical key-value pairs from 13 benchmark papers.

keywords
Large Language Model, Retrieval Augmented Generation, Key-value Extraction, Prompt, SiC Crystal Growth Research

INTERNATIONAL JOURNAL OF CONTENTS