- P-ISSN 2233-4203
- E-ISSN 2093-8950
The recent emergence of large language models (LLMs) has transformed the process of machine learning (ML) model development, markedly reducing the need for advanced coding expertise and enabling domain scientists to directly con- struct computational workflows through natural language prompts. In this study, we demonstrate the application of Google Gemini Pro 2.5 for developing classification models of erectile dysfunction (ED) drug analogues using tandem mass spectromet- ric (MS/MS) data. The dataset consisted of 149 compounds, including sildenafil, vardenafil, tadalafil analogues, and structurally unrelated compounds, represented as binary barcode spectra (m/z 50–800) derived from fragment ion intensities. Through step- wise prompting, the LLM generated executable Python code for data preprocessing, model construction, hyperparameter optimi- zation, and ensemble learning using random forest, artificial neural networks (ANN), and support vector machines (SVM). The resulting models achieved high classification performance comparable to that of a manually programmed ANN reported in our previous work, while requiring markedly less programming effort. Beyond reproducing classification accuracy, this study high- lights the efficiency, accessibility, and reproducibility of LLM-assisted ML workflows, underscoring their potential to democra- tize computational methods in mass spectrometry and analytical chemistry.