Аpplication of large language models for detecting software code vulnerabilities

Authors

  • O.V. Yatskyi Zaporizhzhia National University, Zaporizhzhia city
  • S.Y. Boriu Zaporizhzhia National University, Zaporizhzhia city

DOI:

https://doi.org/10.33216/1998-7927-2025-298-12-38-46

Keywords:

large language models, code vulnerabilities, static analysis, cybersecurity, GPT-4, artificial intelligence, refactoring

Abstract

The article examines the application of large language models (LLMs), particularly GPT-4, for automated detection of vulnerabilities in software code. Modern static code analysis methods have limitations in detecting complex vulnerabilities and contextual security threats, which necessitates the development of new approaches based on artificial intelligence technologies. To assess the ability of large language models (LLMs) to detect vulnerabilities in program code, a series of experiments was conducted using several generations of GPT models, including: Ada (350 million parameters), Curie (6.7 billion), Davinci (175 billion), and the latest GPT-4 model (≈1.7 trillion parameters). Based on a comparative analysis with the traditional static analyzer Snyk, it is demonstrated that GPT-4 is capable of detecting a greater number of vulnerabilities, covering a wider spectrum of threats, as well as proposing relevant fixes for their elimination. The study utilized 64 code fragments in eight programming languages, covering 33 vulnerability categories according to the Common Weakness Enumeration (CWE) classification. The experimental research was conducted by comparing the results of code analysis using GPT-4 and Snyk on identical test data sets. The GPT-4 model demonstrated high efficiency in automatic code refactoring, specifically achieving a 94% reduction in the level of critical vulnerabilities compared to the initial state of the software code. Metrics of threat detection accuracy, the frequency of false positive and false negative results, as well as the model's ability to provide detailed explanations of identified security issues were analyzed separately. A particular advantage of GPT-4 was its ability to work in automatic refactoring mode, which was also built into system context variations. The research results indicate the feasibility of using LLMs as an additional tool for ensuring software code security. Possibilities for integrating large language models into modern continuous integration and deployment (CI/CD) environments are proposed, and prospects for hybrid application together with classical automated software security testing tools are outlined.

References

1. Голуб О. В. Інтеграція великих мовних моделей у системи захисту. Миколаїв: МНУ, 2023. 250 с.

2. Кравченко В. С. Класифікація соціальних інженерних атак: підходи та рішення. Чернівці: ЧНУ, 2023. 315 с.

3. Стеценко Н. Г. Інформаційна культура та кібербезпека. Херсон: ХДУ, 2020. 200 с.

4. Ткачук М. Ф. Виявлення та попередження атак соціальної інженерії. Ужгород: УжНУ, 2023. 230 с.

5. Шевченко Р. П. Моделювання загроз інформаційної безпеки. Суми: СумДУ, 2022. 275 с.

6. Detecting software vulnerabilities using Language Models URL: https://arxiv.org/pdf/ 2302.11773 (дата звернення: 15.01.2026)

7. Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study URL: https://ieeexplore.ieee.org/document/ 10879492 (дата звернення: 15.01.2026)

8. Software Vulnerability Detection using Large Language Models URL: https://medium.com/@balaram2018.dutta/ software-vulnerability-detection-using-large-language-models-b11ddf8d6c73 (дата звернення: 15.01.2026)

Published

2026-01-29