SAC-based reinforcement learning agent for automated network penetration testing

Authors

  • V.V. Vikulov National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute", Kyiv city

DOI:

https://doi.org/10.33216/1998-7927-2025-293-7-5-11

Keywords:

reinforcement learning, cybersecurity, NASim, SAC

Abstract

Cybersecurity threats and cybercrime damage continue to escalate annually. Projections indicate that total damages will reach $10.5 trillion by the end of 2025, representing a 3.5-fold increase from the cybersecurity damages recorded in 2015. With the rise of AI adoption by malicious actors, there is an increasing need for defensive tools that also leverage AI capabilities.

This paper continues the research trend of applying reinforcement learning to penetration testing for identifying security vulnerabilities in computer networks. To achieve this objective, the Network Attack Simulator (NASim) is employed as a testing environment. NASim is a simulator designed for evaluating automated penetration testing through reinforcement learning, built on the Gymnasium framework.

This paper presents a hybrid reinforcement learning algorithm specifically, which combines the Soft Actor-Critic (SAC) architecture with discrete action spaces, thereby enabling the SAC algorithm to operate effectively within environment.

The algorithm underwent evaluation using the nasim:Small-v0 scenario. Experimental results demonstrate that the proposed method achieves several noteworthy performance metrics. First, the algorithm exhibits stable convergence behavior throughout the training process, indicating robust learning dynamics. Second, the method demonstrates exceptional efficiency in system compromise, requiring an average of 8.63 steps during late training episodes to fully compromise target systems. Third, the algorithm maintains a perfect 100% success rate during the evaluation phase, demonstrating reliable and consistent performance.

Additionally, the algorithm achieves an average reward of 184.61 in later training stages, indicating strong performance for potential cybersecurity applications. However, these results require extensive training time, with potentially even longer training periods needed for more complex scenarios. This creates a trade-off between computational efficiency and performance quality that must be considered for practical implementation.

References

Cybersecurity Ventures. Cyberwarfare 2021 Report. 2021. URL: https://cybersecurityventures.com/wp-content/uploads/2021/01/Cyberwarfare-2021-Report.pdf (дата звернення: 13.08.2025).

Qu, S., Du, W., Chen, C., Li, B., & Qiu, M. A survey on reinforcement learning applications in cybersecurity. arXiv preprint arXiv:1905.05965, 2019. URL: https://arxiv.org/abs/1905.05965 (дата звернення: 13.08.2025).

Becker, N., Reti, D., Ntagiou, E. V., Wallum, M., & Schotten, H. D. Evaluation of Reinforcement Learning for Autonomous Penetration Testing using A3C, Q-learning and DQN. arXiv, 2024. doi: 10.48550/arXiv.2407.15656.

Li, Z., Zhang, Q., & Yang, G. EPPTA: Efficient partially observable reinforcement learning agent for penetration testing applications. Engineering Reports, 2024. doi: 10.1002/eng2.12818.

Tran, K., Standen, M., Kim, J., Bowman, D., Richer, T., Akella, A., & Lin, C. T. Cascaded reinforcement learning agents for large action spaces in autonomous penetration testing. Applied Sciences, 2022, 12(21), 11265. doi: 10.3390/app122111265.

Janisch, J., Pevný, T., & Lisý, V. NASimEmu: Network attack simulator & emulator for training agents generalizing to novel scenarios. arXiv preprint arXiv:2305.17246, 2023. URL: https://arxiv.org/abs/2305.17246 (дата звернення: 13.08.2025).

Microsoft. CyberBattleSim: An experimentation research platform to investigate automated agents operating in simulated enterprise environments. 2021. URL: https://github.com/microsoft/ CyberBattleSim (дата звернення: 13.08.2025).

Wang, Y., Li, Y., Xiong, X., Zhang, J., Yao, Q., & Shen, C. DQfD-AIPT: An intelligent penetration testing framework incorporating expert demonstration data. Security and Communication Networks, 2023, 5834434. doi: 10.1155/2023/5834434.

Schwartz, J. Network Attack Simulator — small.yaml scenario. GitHub, 2023. URL: https://github.com/Jjschwartz/NetworkAttackSimulator/blob/4f26de37cfdc3e4553ed8b7484c4db8e2924bdea/nasim/scenarios/benchmark/small.yaml (дата звернення: 13.08.2025).

Li, Z., Zhang, Q., & Yang, G. EPPTA: Efficient partially observable reinforcement learning agent for penetration testing applications. Engineering Reports, 2025, 7(1): e12818. doi: 10.1002/eng2.12818.

Published

2025-09-17