code-vecs

Code for the methods and algorithms described in the paper "Analysis of Program Representations Based on Abstract Syntax Trees and Higher-Order Markov Chains for Source Code Classification Task"

MIT License

Stars
5

Analysis of Program Representations Based on Abstract Syntax Trees and Higher-Order Markov Chains for Source Code Classification Task

Code for the program classification algorithms described in the paper "Analysis of Program Representations Based on Abstract Syntax Trees and Higher-Order Markov Chains for Source Code Classification Task" [1].

Getting Started

  1. Install Docker CE and GNU make.
  2. Clone the repository, then clone the submodules using git submodule update --init --recursive
  3. Download the dataset [2] from Zenodo and extract the task-*.csv files into src/data.
  4. Classification targets can contain digits, so navigate to external/code2vec/common.py and apply the patch:
     @staticmethod
     def legal_method_names_checker(special_words, name):
-        return name != special_words.OOV and re.match(r'^[a-zA-Z|]+$', name)
+        return name != special_words.OOV
  1. Run make notebook from repository root, run the notebooks.

References

  1. Gorchakov, A.V.; Demidova, L.A.; Sovietov, P.N. Analysis of Program Representations Based on Abstract Syntax Trees and Higher-Order Markov Chains for Source Code Classification Task. Future Internet 2023, 15, 314.
  2. Demidova, L.A.; Andrianova, E.G.; Sovietov, P.N.; Gorchakov, A.V. Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant. Data 2023, 8 (6), p. 109.

Citation

If you use the code available in this repository in your research work, please consider citing our paper [1] published in Future Internet:

Gorchakov, A.V.; Demidova, L.A.; Sovietov, P.N. Analysis of Program Representations Based on Abstract Syntax Trees and Higher-Order Markov Chains for Source Code Classification Task. Future Internet 2023, 15, 314. https://doi.org/10.3390/fi15090314