Project Report: Discuss Ready

The Problem

International and non-traditional students consistently struggle to participate in high-stakes classroom discussions — not because they lack ideas, but because existing tools leave them without structured support. Most preparation is passive: read the material, show up, hope for the best.

Core Insight

Research shows that students often have the knowledge but lack the cognitive scaffolding to transform it into a discussion-ready contribution. Without structure, most students stay in the "Passive" or "Active" modes of the ICAP framework — far from the "Constructive" and "Interactive" engagement that meaningful discussion demands (Chi & Wylie, 2014).

Designing from Research, Not Intuition

Every design decision in Discuss Ready traces back to an empirical source. Rather than guessing what students need, I built the four-stage structure directly from the literature:

1

Stage 1 & 2: Self-Explanation — grounded in Bielaczyc et al. (1995)

Bielaczyc's research shows that effective self-explanation isn't just "explain this" — it requires structured prompts that move from abstract conceptual understanding to concrete connections with prior knowledge. Stage 1 guides students to summarize the material; Stage 2 asks them to connect it to personal experience. This sequence deliberately mirrors the scaffolding progression Bielaczyc demonstrated improves higher-level cognitive skill acquisition.

2

Stage 3 & 4: Opinion & Counterargument — grounded in Nussbaum et al. (2007)

Stages 3 and 4 push students into ICAP's "Constructive" mode: forming a clearly evidenced opinion, then stress-testing it by articulating the strongest opposing view. This follows Nussbaum's framework for building argumentation competence — not just having a position, but being able to defend and refine it.

3

Feedback & Hints — grounded in Aleven & Koedinger (2002)

Aleven's research on cognitive tutors shows that hints should never give the answer — they should direct students toward the right resources and reasoning. Every feedback response in Discuss Ready is designed to reveal gaps in understanding rather than fill them, and every hint points students back to the material rather than forward to the solution.

Systematic AI Evaluation: Choosing the Right Model

Building the right instructional experience required more than a good prompt — it required selecting the right AI model. I systematically evaluated five LLMs across five pedagogically-grounded criteria through three rounds of iterative design testing.

Why This Mattered

Different models have fundamentally different personalities. A model that scores perfectly on accuracy can still undermine learning if its tone makes students feel judged. I needed a model that was both pedagogically accurate and psychologically safe for non-native English speakers in a high-stakes context.

5 LLMs evaluated (GPT-4o, Gemini, Llama 4, Llama 3.3, Mistral)

5 Evaluation criteria (Scoring Accuracy, Text Grounding, Logic & Flow, Formatting, Tone)

3 Rounds of prompt iteration and refinement

10/10 Final score for Mistral Large 3 — the selected model

Mistral Large 3 scored 10/10 — the only model to pass all five criteria. But the reason I chose it went beyond the rubric: Mistral's tone was distinctly encouraging and conversational, written at a CEFR B1 level that made non-native speakers feel supported rather than evaluated. GPT-4o was strict but cold. Llama 4 was accurate but blunt. Mistral guided students toward thinking deeper by asking questions, rather than simply flagging errors.

Designing an AI Tutor Through Iterative Testing

Designing the instructional logic of Discuss Ready was not a one-shot process — it required three rounds of structured design, failure analysis, and targeted refinement. Each round began with a testable prototype and ended with a clearly identified failure mode that shaped the next iteration.

1

Round 1 — Establishing the Cognitive Scaffold

The first design established the four-stage learning sequence and basic feedback structure. Testing revealed a critical flaw: without explicit gating logic, the AI advanced students regardless of response quality. Students who gave superficial answers were praised and moved forward — directly contradicting the mastery-based progression that cognitive tutor research requires. The failure mode was not a technical bug; it was a missing instructional constraint.

2

Round 2 — Adding Mastery Gating

Round 2 introduced strict mastery-based progression: students could not advance until they achieved the maximum score at each stage. Testing confirmed this fixed the advancement problem — but introduced a new one. Students were caught in retry loops on early stages, cycling through repeated attempts without the targeted feedback they needed to improve. Maximum strictness without differentiated hinting created frustration rather than learning.

3

Round 3 — Calibrating Thresholds and Scaffolded Hints

The final design differentiated scoring thresholds by cognitive demand (Stages 1 & 2 scored out of 2; Stages 3 & 4 out of 3), added text-grounding rules requiring the AI to cite specific chapters or sections rather than giving generic feedback, constrained tone to CEFR B1 language, and capped responses at 60 words. Each change was verified against calibrated test inputs across all score levels before moving to live student testing.

Final Instructional Architecture

The final version of Discuss Ready is not simply a chatbot with instructions — it is a structured instructional system with distinct pedagogical logic at every layer. Each design element maps directly to a learning science principle: what students do, in what order, under what conditions, and with what kind of support.

Instructional design decisions encoded in the system

The four-stage learning sequence encodes a deliberate cognitive progression from comprehension to argumentation. Mastery-based gating ensures students cannot advance by giving surface-level answers. Differentiated scoring rubrics (2-point for recall stages, 3-point for argumentation stages) calibrate challenge to cognitive demand. Text-grounding rules prevent the AI from providing answers — every hint directs students back to the source material, keeping the cognitive work with the learner. CEFR B1 tone constraints and a 60-word response limit are accessibility decisions: they ensure non-native English speakers feel supported, not evaluated.

1

Self-Explanation

Students summarise the reading in their own words. Scored out of 2; gaps trigger a pointer to a specific chapter or section, not a correction.

2

Prior Knowledge Connection

Students link the material to personal experience or earlier learning. Vague links score 1/2 and prompt elaboration before advancing.

3

Opinion Formation

Students state a clear, evidence-backed position. Scored out of 3; a vague reason prompts the AI to request a direct quote from the text.

4

Counterargument

Students articulate the strongest opposing view to their own Stage 3 position. Full marks require a direct challenge with specific evidence.

System Prompt — Pre-Class Prep Stage

# ROLE
You are a warm, supportive AI tutor helping an international student prepare for an English-medium academic discussion.

# CONTEXT & GROUNDING
The student has uploaded preparation material. You must ground all evaluations strictly in this material. Never hallucinate facts outside the material.
*CRITICAL FEEDBACK RULES:* 1. If a student misses points in Stage 1, you MUST advise them to recheck the material and explicitly specify WHICH chapter, section, or heading they need to look at.
2. Whenever a student needs to strengthen their reasoning (especially in Stages 3 and 4), explicitly encourage them to use direct quotes or specific references from the material.

# SCORING & BRANCHING LOGIC (VISIBLE EVALUATION)
For EVERY response from the student, evaluate it and assign a Score.
- Stages 1 and 2 are scored out of 2 (0, 1, or 2).
- Stages 3 and 4 are scored out of 3 (0, 1, 2, or 3).
CRITICAL: You MUST display the score at the very top of your response exactly like this: **[Score: X/2]** or **[Score: X/3]**. Then, execute the corresponding action.

# PROGRESSION RULE (STRICT MASTERY)
The student CANNOT advance to the next stage until they achieve the MAXIMUM score (2/2 for Stages 1 and 2, and 3/3 for Stages 3 and 4).
If they receive a lower score, you must provide feedback and wait for them to try again. Do not move to the next stage until they get a perfect score.

# EXAMPLES FOR CALIBRATION (Dummy topic: "Banning smartphones")
- Score 1 (Missing major part): S3 "I agree with banning phones." (No reason). S4 "Parents need to call kids." (Valid, but doesn't challenge the specific S3 reason).
- Score 2 (Vague/Surface level): S3 "I agree because they are bad for studying." S4 "It's good to use phones for emergencies, which breaks the focus."
- Score 3 (Strong Evidence/Logic): S3 "I agree because the article shows phones reduce test scores by 10% due to distraction."

--- Stage 1: Conceptual Self-Explanation ---
Goal: Summarise the main ideas of the material accurately.
- Score 2 (Accurate & Complete): Display **[Score: 2/2]**. Say "✓ Stage 1 complete!". Move to Stage 2.
- Score 1 (Partial/Missing details): Display **[Score: 1/2]**. Advise them to recheck a specific chapter/section. Wait for their next attempt.
- Score 0 (Incorrect/Missing): Display **[Score: 0/2]**. Point to the exact chapter/section. Give 1 hint. Wait for their next attempt.

--- Stage 2: Connecting to Prior Knowledge ---
Goal: Connect material to personal experience or prior learning.
- Score 2 (Clear link & explanation): Display **[Score: 2/2]**. Say "✓ Stage 2 complete!". Move to Stage 3.
- Score 1 (Vague link only): Display **[Score: 1/2]**. Ask them to elaborate. Wait for their next attempt.
- Score 0 (No connection): Display **[Score: 0/2]**. Explain the need for a link. Give 1 hint. Wait for their next attempt.

--- Stage 3: Opinion Formation ---
Goal: State a clear opinion supported by SPECIFIC evidence or strong reasoning.
- Score 3 (Opinion + Specific Evidence): Display **[Score: 3/3]**. Say "✓ Stage 3 complete!". Move to Stage 4.
- Score 2 (Opinion + Vague Reason): Display **[Score: 2/3]**. Suggest they use a direct quote from the material. Wait for their next attempt.
- Score 1 (Opinion only): Display **[Score: 1/3]**. Ask "Why do you think so?". Wait for their next attempt.
- Score 0 (No position): Display **[Score: 0/3]**. Encourage taking a side. Wait for their next attempt.

--- Stage 4: Counterargument Consideration ---
Goal: Identify an opposing view that directly challenges their Stage 3 reason with specific evidence.
- Score 3 (Direct challenge + Specific Evidence): Display **[Score: 3/3]**. Say "✓ Stage 4 complete! 🎉". Generate Rehearsal Summary.
- Score 2 (Direct challenge but weak logic): Display **[Score: 2/3]**. Ask for stronger evidence or a quote for the opposing side. Wait for their next attempt.
- Score 1 (General counter): Display **[Score: 1/3]**. Ask how an opponent would respond to their specific Stage 3 reason. Wait for their next attempt.
- Score 0 (No counter): Display **[Score: 0/3]**. Explain what a counterargument is. Wait for their next attempt.

# REHEARSAL SUMMARY
After Stage 4: "You have completed all 4 stages. Here is your Rehearsal Summary — sentences you can use when speaking in class today:"
Labels: My Summary:, My Connection:, My Opinion:, Counterargument to Consider: (1-2 sentences each).

# FORMAT AND TONE
- Warm, conversational prose (CEFR B1).
- Max 3 sentences per response (excluding Score).
- Total word count under 60 words.
- No bullet points or lists in conversation.

# HOW TO START
Respond EXACTLY with:
"Welcome! Let's prepare for your class together. We'll work through 4 stages. First you share your idea, then I'll give feedback based on the material to help you improve. Let's start with Stage 1 — Self-Explanation. What are the main points of the preparation material? Please summarise them in your own words."

The Output: A Rehearsal Summary

After completing all four stages, students receive a personalized Rehearsal Summary — four short, speakable sentences drawn directly from their own responses across the conversation. This isn't AI-generated advice; it's their own thinking, organized into something they can actually say in class.

This design decision reflects a core principle from the SToG framework (Kollar, Wecker, & Fischer, 2018): scaffolds should support the structure of participation without prescribing the content. The tool gives students confidence in how to participate — what they say remains entirely their own.

Try Discuss Ready Live

Discuss Ready: Guided Preparation for Classroom Discussion