Personalization of Programming Education: An NLP-Based Bi-dimensional Classification of Programming Exercises

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

23 Downloads (Pure)

Abstract

The need for scalable and personalized content in programming education is driving interest in automating exercise generation. This requires a clear understanding of existing exercises. Our research addresses this by classifying existing exercises by topic and difficulty level. We combine a lexicon-based analysis with machine learning and advanced natural language processing techniques, providing a foundation for AI-assisted content generation. Specifically, we utilize BERTopic for topic modeling and five machine learning models to predict difficulty levels in programming exercises. Our dataset includes 106 programming exercise descriptions from three introductory courses, plus performance data from up to 189 learners. The results demonstrate that lexicon-based approaches significantly improve topic modeling accuracy and coherence compared to the baseline, with reduced variance and more consistent cluster stability. Although difficulty prediction remains challenging due to the complexity of defining ground truth, lexicon integration leads to modest yet consistent performance gains. This work lays an essential groundwork for scalable and resource-efficient solutions for the classification and generation of personalized programming exercises.
Original languageEnglish
Title of host publicationSPLASH-E'25
Subtitle of host publicationProceedings of the 2025 ACM SIGPLAN International Symposium on SPLASH-E
EditorsMartin Henz, Felienne Hermans, Daniel Patterson
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc.
Pages90-101
Number of pages12
ISBN (Electronic)979-8-4007-2142-7
DOIs
Publication statusPublished - 9 Oct 2025
Event2025 ACM SIGPLAN International Symposium on SPLASH-E, SPLACH-E 2025 - Singapore, Singapore
Duration: 12 Oct 202518 Oct 2025

Conference

Conference2025 ACM SIGPLAN International Symposium on SPLASH-E, SPLACH-E 2025
Abbreviated titleSPLASH-E 2025
Country/TerritorySingapore
CitySingapore
Period12/10/2518/10/25

Keywords

  • difficulty level prediction
  • lexicon-based approach
  • natural language processing
  • programming education
  • topic modeling

Fingerprint

Dive into the research topics of 'Personalization of Programming Education: An NLP-Based Bi-dimensional Classification of Programming Exercises'. Together they form a unique fingerprint.

Cite this