Abstract
The need for scalable and personalized content in programming education is driving interest in automating exercise generation. This requires a clear understanding of existing exercises. Our research addresses this by classifying existing exercises by topic and difficulty level. We combine a lexicon-based analysis with machine learning and advanced natural language processing techniques, providing a foundation for AI-assisted content generation. Specifically, we utilize BERTopic for topic modeling and five machine learning models to predict difficulty levels in programming exercises. Our dataset includes 106 programming exercise descriptions from three introductory courses, plus performance data from up to 189 learners. The results demonstrate that lexicon-based approaches significantly improve topic modeling accuracy and coherence compared to the baseline, with reduced variance and more consistent cluster stability. Although difficulty prediction remains challenging due to the complexity of defining ground truth, lexicon integration leads to modest yet consistent performance gains. This work lays an essential groundwork for scalable and resource-efficient solutions for the classification and generation of personalized programming exercises.
| Original language | English |
|---|---|
| Title of host publication | SPLASH-E'25 |
| Subtitle of host publication | Proceedings of the 2025 ACM SIGPLAN International Symposium on SPLASH-E |
| Editors | Martin Henz, Felienne Hermans, Daniel Patterson |
| Place of Publication | New York |
| Publisher | Association for Computing Machinery, Inc. |
| Pages | 90-101 |
| Number of pages | 12 |
| ISBN (Electronic) | 979-8-4007-2142-7 |
| DOIs | |
| Publication status | Published - 9 Oct 2025 |
| Event | 2025 ACM SIGPLAN International Symposium on SPLASH-E, SPLACH-E 2025 - Singapore, Singapore Duration: 12 Oct 2025 → 18 Oct 2025 |
Conference
| Conference | 2025 ACM SIGPLAN International Symposium on SPLASH-E, SPLACH-E 2025 |
|---|---|
| Abbreviated title | SPLASH-E 2025 |
| Country/Territory | Singapore |
| City | Singapore |
| Period | 12/10/25 → 18/10/25 |
Keywords
- difficulty level prediction
- lexicon-based approach
- natural language processing
- programming education
- topic modeling
Fingerprint
Dive into the research topics of 'Personalization of Programming Education: An NLP-Based Bi-dimensional Classification of Programming Exercises'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver