TU Grade

AI-Powered Automatic Short Answer Grading with Explanatory Feedback

Moodle PluginResearch ProjectResearch Project

Overview

The Problem

Manual grading of short answer questions is time-consuming, subjective, and often lacks timely, detailed feedback for students. Educators struggle to provide individualized responses at scale, while students receive delayed or minimal explanations that hinder their learning progress. Additionally, training educational AI models faces significant data scarcity challenges.

The Solution

TU Grade is an AI-powered grading assistant that automatically evaluates student responses and provides detailed, individualized feedback. Built on advanced large language models and integrated seamlessly into Moodle, it serves as a second opinion for educators while delivering immediate, constructive feedback to students. The system is based on Leon Camus's Master's thesis and has been in production at TU Darmstadt since 2024.

Key Features

Automatic Scoring

0-9 point scale with AI-generated justifications for each grade, providing transparent and consistent evaluation.

Individual Feedback

Personalized, explanatory responses for every student submission, helping learners understand their mistakes.

RAG-Powered References

Retrieval-Augmented Generation dynamically retrieves relevant reference answers for context-aware grading.

Configurable Rubrics

Customizable grading parameters per exercise, allowing educators to define specific evaluation criteria.

High-Performance Backend

Rust API with vLLM inference engine delivers low-latency responses for real-time grading feedback.

Moodle Integration

Custom question type with seamless LMS integration, working within your existing Moodle infrastructure.

How It Works

TU Grade uses a modern, high-performance architecture designed for efficient AI inference and seamless Moodle integration:

Backend System

Rust-based API for high performance and reliability
vLLM for optimized model inference
PostgreSQL database for data persistence
RAG (Retrieval-Augmented Generation) for reference answers
Token-based authentication per exercise
Optimized for low-latency grading responses

Moodle Plugin

PHP-based question type plugin
Derived from short answer question type
Configuration interface for educators
Student answer input (text field or area)
Review interface with AI feedback display
Seamless LMS integration

User Interface

Exercise Configuration

The configuration interface allows educators to set up questions, define reference answers, and configure grading parameters:

Question text editor with rich text formatting options for the student-facing question.

TU Grade exercise configuration - raw question text and API settings

Raw question text for AI processing, context field, TUGrade URL and token authentication settings.

TU Grade exercise configuration - reference answer and grading criteria

Reference answer configuration with grading scale, reason for grading, and autofill reference button.

Student Input

Students can submit their answers using either a compact text field or an expanded text area:

TU Grade student input interface with text area

Evaluation & Feedback

After submission, students receive immediate feedback with AI-generated explanations and grades:

TU Grade evaluation interface showing student answer, AI feedback, and grade

Technical Details

Model

Vicuna 7B v1.5 (Llama 2 based)
Instruction-tuned for grading tasks
GPTQ 4-bit quantization
Only 3GB VRAM for model weights
Greedy sampling (deterministic)
KV-caching for fast inference

Fine-tuning

LoRA adaptation (r=16)
350 training steps
SAF dataset (Filighera et al., 2022)
Discrete regression (0-9 scale)
English and German support
Zero-shot capable

Infrastructure

vLLM for optimized inference throughput
Rust API for high-performance reliable serving
PostgreSQL for persistent storage
RAG for context-aware reference answers
Token-based authentication
2x NVIDIA 2080ti GPUs (~24GB VRAM)

Research Background

This project is based on Leon Camus's Master's thesis "TU Grade - To Grade using large language models" and related research on Transformers for automatic short answer grading. The system addresses data scarcity through zero-shot capabilities and introduces an explanatory grading approach that goes beyond simple scores.

Key Research Findings

Llama 2 significantly outperforms Llama 1 in grading tasks
Instruction-tuned models excel at zero-shot grading without fine-tuning
Model-generated explanations are semantically closer to gold standards than human responses
Fine-tuning has marginal impact on semantic quality but improves syntactic alignment
Discrete regression (0-9 scale) shows limitations for capturing nuanced response quality
KV-caching enables efficient real-time inference

Current Status

TU Grade was deployed at TU Darmstadt as an experimental research system to evaluate AI-powered grading in real-world educational settings. The system processed student submissions and provided automated feedback during the evaluation period.

Evaluation Period (2024-2025)

Experimental deployment at TU Darmstadt
Institutional use for research purposes
Evaluated AI grading effectiveness in practice
Moodle plugin integrated with test courses

Project Status

Research project completed (2025)
Tool phased out in late 2025
Replaced by newer solutions
Contact for research collaboration

For inquiries about collaboration, deployment, or research opportunities

Contact Leon Camus