Application of Reinforcement Learning in Code Repair
DOI:
https://doi.org/10.26821/IJSHRE.10.4.2022.100204Keywords:
Code repair, Long Short-Term Memory (LSTM), Recurrent Neural Networks (RNN), TokenizationAbstract
Novice coders very frequently come across compile errors and learning to code without syntactical errors or debugging based on the given error messages can be a challenging task. In this study, I created a machine learning model that collects compile error messages of codes created by novice students, learns them using an LSTM recurrent neural network model, and repairs them correctly. Training data were collected from an online judge system, in which functioning codes were purposely and systematically modified to become erroneous. After the tokenization preprocessing step, I used LSTM to repair the erroneous parts of the given code. It was confirmed that the machine learning model created in this study solved 43% of the errors generated by novice programmers. Specifically, relatively simple errors including missing semicolons or unmatched brackets could be fixed with high accuracies of 78% and 73%, respectively. The results of this study highlight those errors of simple syntax are easy to fix with artificial intelligence, whereas those that depend more on context and user intention are harder to repair.
References
Hyunmin Seo, Caitlin Sadowski, Sebastian Elbaum, Edward Aftandilia, Robert Bowdidge, “Programmers’ Build Errors: A Case Study (at Google)” ICSE 2014: Proceedings of the 36th International Conference on Software Engineering, 724-734, 2014
Abdulaziz Alaboudi, Thomas LaToza, “An Exploratory Study of Debugging Episodes”, Available From: https://www.researchgate.net/publication/351354680_An_Exploratory_Study_of_Debugging_Episodes (accessed Aug, 11, 2021)
Kaelbling, Leslie P.; Littman, Michael L.; Moore, Andrew W. "Reinforcement Learning: A Survey". Journal of Artificial Intelligence Research. 4: 237-285, 1996
Dupond, Samuel, "A thorough review on the current advance of neural network structures". Annual Reviews in Control. 14: 200–230, 2019
Sepp Hochreiter; Jürgen Schmidhuber, "Long short-term memory". Neural Computation. 9 (8): 1735–1780, 1997
Trim, Craig (Jan 23, 2013). "The Art of Tokenization". Developer Works. IBM. Available From: https://www.ibm.com/developerworks/community/blogs/nlp/entry/tokenization?lang=en (accessed June, 20, 2021)
guillaumenkln, OpenNMT/Tokenizer, Available From: https://github.com/OpenNMT/Tokenizer (accessed July, 03, 2021)
Published
Versions
- 2022-05-24 (2)
- 2022-04-30 (1)