Application of Reinforcement Learning in Code Repair

Authors

  • Young Kim Korea International School, Pangyo, Korea

DOI:

https://doi.org/10.26821/IJSHRE.10.4.2022.100204

Keywords:

Code repair, Long Short-Term Memory (LSTM), Recurrent Neural Networks (RNN), Tokenization

Abstract

Novice coders very frequently come across compile errors and learning to code without syntactical errors or debugging based on the given error messages can be a challenging task. In this study, I created a machine learning model that collects compile error messages of codes created by novice students, learns them using an LSTM recurrent neural network model, and repairs them correctly. Training data were collected from an online judge system, in which functioning codes were purposely and systematically modified to become erroneous. After the tokenization preprocessing step, I used LSTM to repair the erroneous parts of the given code. It was confirmed that the machine learning model created in this study solved 43% of the errors generated by novice programmers. Specifically, relatively simple errors including missing semicolons or unmatched brackets could be fixed with high accuracies of 78% and 73%, respectively. The results of this study highlight those errors of simple syntax are easy to fix with artificial intelligence, whereas those that depend more on context and user intention are harder to repair.

References

Hyunmin Seo, Caitlin Sadowski, Sebastian Elbaum, Edward Aftandilia, Robert Bowdidge, “Programmers’ Build Errors: A Case Study (at Google)” ICSE 2014: Proceedings of the 36th International Conference on Software Engineering, 724-734, 2014

Abdulaziz Alaboudi, Thomas LaToza, “An Exploratory Study of Debugging Episodes”, Available From: https://www.researchgate.net/publication/351354680_An_Exploratory_Study_of_Debugging_Episodes (accessed Aug, 11, 2021)

Kaelbling, Leslie P.; Littman, Michael L.; Moore, Andrew W. "Reinforcement Learning: A Survey". Journal of Artificial Intelligence Research. 4: 237-285, 1996

Dupond, Samuel, "A thorough review on the current advance of neural network structures". Annual Reviews in Control. 14: 200–230, 2019

Sepp Hochreiter; Jürgen Schmidhuber, "Long short-term memory". Neural Computation. 9 (8): 1735–1780, 1997

Trim, Craig (Jan 23, 2013). "The Art of Tokenization". Developer Works. IBM. Available From: https://www.ibm.com/developerworks/community/blogs/nlp/entry/tokenization?lang=en (accessed June, 20, 2021)

guillaumenkln, OpenNMT/Tokenizer, Available From: https://github.com/OpenNMT/Tokenizer (accessed July, 03, 2021)

Downloads

Published

2022-04-30 — Updated on 2022-05-24

Versions

How to Cite

Young Kim. (2022). Application of Reinforcement Learning in Code Repair. iJournals:International Journal of Software & Hardware Research in Engineering ISSN:2347-4890, 10(4). https://doi.org/10.26821/IJSHRE.10.4.2022.100204 (Original work published April 30, 2022)