Slide Editing Robustness Fixes
One-Line Summary: Implementation plan to fix slide editing failures including deck loss, LLM response validation, add vs edit intent detection, and JavaScript/canvas corruption issues.
1. Problem Summary
During slide editing operations, several failure modes have been identified that cause data loss and corruption:
| ID | Root Cause | Impact | Priority | Status |
|---|---|---|---|---|
| RC3 | Deck replaced when parsing fails | Complete deck destruction | P0 - Critical | ✅ Fixed |
| RC1 | LLM returns text instead of HTML | Triggers deck loss | P0 - Critical | ✅ Fixed |
| RC6 | Cache not restored after backend restart | Complete deck loss on edit | P0 - Critical | ✅ Fixed |
| RC7 | Scripts lost when loading from database | Charts disappear after restart | P0 - Critical | ✅ Fixed |
| RC2 | "Add slide" treated as "replace" | Slides disappear | P1 - High | ✅ Fixed |
| RC4 | Canvas ID collisions | Chart conflicts | P2 - Medium | ✅ Fixed |
| RC5 | Script split/merge fragile | JS syntax errors | P2 - Medium | ✅ Fixed |
| RC8 | "Edit slide 8" without selection wipes deck | Complete deck destruction | P0 - Critical | ✅ Fixed |
| RC9 | "Add after slide 3" goes to end | Wrong position | P1 - High | ✅ Fixed |
| RC10 | Ambiguous requests proceed without confirmation | Unpredictable behavior | P1 - High | ✅ Fixed |
| RC11 | Selection vs text reference conflict | Confusing behavior | P2 - Medium | ✅ Fixed |
| RC12 | "Create slides" with existing deck replaces silently | Accidental data loss | P0 - Critical | ✅ Fixed |
| RC13 | "Edit slide 7" without selection - LLM has no context | LLM asks for slide content | P1 - High | ✅ Fixed |
| RC14 | "Duplicate slide 4" returns empty HTML instead of guidance | Confusing UX | P2 - Medium | ✅ Fixed |
| RC15 | Optimize loses chart scripts due to RC4 canvas ID mismatch | Charts disappear after optimize | P1 - Critical | ✅ Fixed |
2. Architecture Context
Components Involved
| Component | File | Responsibility |
|---|---|---|
| Agent | src/services/agent.py | LLM invocation, response parsing |
| Chat Service | src/api/services/chat_service.py | Deck cache, replacement logic |
| HTML Utils | src/utils/html_utils.py | Script splitting, canvas ID extraction |
| Slide Deck | src/domain/slide_deck.py | Deck parsing, knitting |
| Defaults | src/core/defaults.py | System prompt, editing instructions |
Data Flow (Current - Problematic)
User Request → Agent → LLM Response → Parse Slides → Apply Replacements → Update Deck
↓
[If parse fails, deck can be destroyed]
Data Flow (Target - Safe)
User Request → Detect Intent → Agent → LLM Response → Validate Response
↓
[If invalid: retry once OR preserve deck]
↓
[If valid: Apply Replacements → Update Deck]
3. Implementation Plan
Phase 1: Deck Preservation Guard (RC3)
Goal: Never destroy the deck when editing fails.
Changes:
src/api/services/chat_service.py- Add guard insend_message_streaming:
# BEFORE (dangerous):
if slide_context and replacement_info:
slide_deck_dict = self._apply_slide_replacements(...)
elif html_output and html_output.strip():
# This branch can destroy the deck!
current_deck = SlideDeck.from_html_string(html_output)
# AFTER (safe):
if slide_context and replacement_info:
slide_deck_dict = self._apply_slide_replacements(...)
elif slide_context and not replacement_info:
# GUARD: slide_context was provided but parsing failed
# Preserve existing deck, return error
logger.error("Slide replacement parsing failed, preserving existing deck")
raise ValueError("Failed to parse LLM response as slide replacements")
elif html_output and html_output.strip():
# Only create new deck if NOT in editing mode
current_deck = SlideDeck.from_html_string(html_output)
- Same change in
send_message(non-streaming path)
Test Cases:
| Test ID | Scenario | Expected Outcome |
|---|---|---|
| RC3-T1 | LLM returns text with slides selected | Existing deck preserved, error returned |
| RC3-T2 | LLM returns empty string with slides selected | Existing deck preserved, error returned |
| RC3-T3 | LLM returns malformed HTML with slides selected | Existing deck preserved, error returned |
| RC3-T4 | Normal edit (valid HTML) | Deck updated correctly |
| RC3-T5 | New generation (no slides selected) | New deck created |
Phase 2: LLM Response Validation & Retry (RC1)
Goal: Detect invalid LLM responses early and retry once before failing.
Changes:
src/services/agent.py- Add validation method:
def _validate_editing_response(self, llm_response: str) -> tuple[bool, str]:
"""Validate that LLM response contains valid slide HTML.
Returns:
(is_valid, error_message)
"""
if not llm_response or not llm_response.strip():
return False, "Empty response"
# Check for conversational text patterns (LLM confusion)
confusion_patterns = [
"I understand",
"I cannot",
"I'm sorry",
"I don't",
"There are no slides",
"slides have been deleted",
"no slides to display",
]
lower_response = llm_response.lower()
for pattern in confusion_patterns:
if pattern.lower() in lower_response and '<div class="slide"' not in llm_response:
return False, f"LLM returned conversational text instead of HTML: {pattern}"
# Check for at least one slide div
soup = BeautifulSoup(llm_response, "html.parser")
slide_divs = soup.find_all("div", class_="slide")
if not slide_divs:
return False, "No <div class='slide'> elements found in response"
return True, ""
src/services/agent.py- Add retry logic ingenerate_slides_streaming:
# After getting LLM response, validate before parsing
if editing_mode:
is_valid, error_msg = self._validate_editing_response(html_output)
if not is_valid:
logger.warning(f"Invalid editing response, retrying: {error_msg}")
# Retry with stronger prompt
retry_prompt = (
f"{full_question}\n\n"
"IMPORTANT: You MUST respond with valid HTML slide divs. "
"Do NOT respond with conversational text. "
"Return ONLY <div class='slide'>...</div> elements."
)
retry_result = agent_executor.invoke({
"input": retry_prompt,
"chat_history": chat_history.messages,
})
html_output = retry_result["output"]
# Validate retry
is_valid, error_msg = self._validate_editing_response(html_output)
if not is_valid:
raise AgentError(f"LLM failed to return valid slide HTML after retry: {error_msg}")
# Now safe to parse
replacement_info = self._parse_slide_replacements(...)
Test Cases:
| Test ID | Scenario | Expected Outcome |
|---|---|---|
| RC1-T1 | LLM returns "I understand you want to delete..." | Retry triggered, if retry fails → error |
| RC1-T2 | LLM returns "I cannot modify these slides" | Retry triggered |
| RC1-T3 | LLM returns valid HTML on first try | No retry, success |
| RC1-T4 | LLM returns valid HTML on retry | Success after retry |
| RC1-T5 | LLM returns text on both attempts | Error raised, deck preserved |
| RC1-T6 | LLM returns HTML without slide divs | Retry triggered |
Phase 3: Add vs Edit Intent Detection (RC2)
Goal: Detect when user wants to ADD a slide vs EDIT existing slides.
Changes:
src/services/agent.py- Add intent detection:
def _detect_add_intent(self, message: str) -> bool:
"""Detect if user wants to add a new slide rather than edit existing ones.
Returns:
True if message indicates adding/inserting a new slide
"""
add_patterns = [
r'\badd\b.*\bslide\b',
r'\binsert\b.*\bslide\b',
r'\bappend\b.*\bslide\b',
r'\bnew\s+slide\b',
r'\bcreate\b.*\bslide\b',
r'\badd\b.*\bat\s+the\s+(bottom|end|top|beginning)\b',
r'\bslide\b.*\bat\s+the\s+(bottom|end|top|beginning)\b',
]
lower_message = message.lower()
for pattern in add_patterns:
if re.search(pattern, lower_message):
return True
return False
src/services/agent.py- Modify prompt for add operations:
def _format_slide_context(self, slide_context: dict[str, Any], is_add_operation: bool = False) -> str:
"""Format slide context for injection into the user message."""
context_parts = ["<slide-context>"]
for html in slide_context.get("slide_htmls", []):
context_parts.append(html)
context_parts.append("</slide-context>")
if is_add_operation:
context_parts.append(
"\n\nIMPORTANT: The user wants to ADD a new slide. "
"You MUST return ALL the slides shown above PLUS the new slide. "
"Do NOT replace the existing slides - include them in your response along with the new slide."
)
return "\n\n".join(context_parts)
src/services/agent.py- Use ingenerate_slides_streaming:
if slide_context:
is_add = self._detect_add_intent(question)
context_str = self._format_slide_context(slide_context, is_add_operation=is_add)
full_question = f"{context_str}\n\n{question}"
logger.info(
"Slide editing mode",
extra={
"is_add_operation": is_add,
"selected_indices": slide_context.get("indices", []),
},
)
src/services/agent.py- Passis_add_operationflag to replacement info:
# After _parse_slide_replacements, add flag:
if replacement_info:
replacement_info["is_add_operation"] = is_add_operation
src/api/services/chat_service.py- Backend guard: Append instead of replace for add operations:
def _apply_slide_replacements(self, replacement_info: Dict[str, Any], session_id: str) -> Dict[str, Any]:
# ... get current_deck ...
is_add_operation = replacement_info.get("is_add_operation", False)
# RC2: For add operations, append to end of deck instead of replacing
if is_add_operation:
insert_position = len(current_deck.slides)
logger.info(
"Add operation detected - appending slides to end of deck",
extra={
"current_slide_count": len(current_deck.slides),
"new_slides_count": len(replacement_slides),
},
)
# Insert new slides at end of deck (don't remove any originals)
for idx, slide in enumerate(replacement_slides):
slide.slide_id = f"slide_{insert_position + idx}"
current_deck.insert_slide(insert_position + idx, slide)
return current_deck.to_dict()
# ... standard replacement logic for non-add operations ...
Why This Backend Guard Is Critical:
- The LLM often ignores instructions to "return all slides plus new slide"
- The LLM's system prompt says to return "only slides that need changing"
- This creates a conflict that the LLM resolves by returning only the new slide
- The backend guard ensures add operations NEVER destroy existing slides
src/core/defaults.py- LLM Instructions Aligned (Operation Types):
# In slide_editing_instructions:
IMPORTANT - Operation Types:
- EDIT (user wants to modify existing slides): Return the modified version of each provided slide. Keep the same number of slides.
- ADD (user wants to add/insert/create a NEW slide): Return ONLY the new slide(s). The system will automatically append them to the deck.
- EXPAND (user wants to split/expand slides into more): You may return more slides than provided - this replaces the originals.
# In rules section:
- For EDIT operations: return the same number of slides as provided
- For ADD operations: return only the new slide(s) to be added
- For EXPAND operations: you may return more slides than provided
src/services/agent.py- Consistent ADD Instruction:
# In _format_slide_context() for add operations:
if is_add_operation:
context_parts.append(
"\n\nIMPORTANT: The user wants to ADD a new slide. "
"Return ONLY the new slide(s) to be added - the system will automatically append them to the deck. "
"Do NOT return the existing slides shown above - just the new slide content."
)
All Instructions Now Aligned:
defaults.py: "Return ONLY the new slide(s)"agent.py: "Return ONLY the new slide(s) to be added"- Backend: Appends new slides, never touches existing
src/api/services/chat_service.py- Critical: Handle ADD without slide_context:
The original bug: When user says "add a summary slide" WITHOUT selecting any slides:
- No
slide_contextis sent by frontend - Backend treats it as GENERATION mode (not EDIT mode)
- LLM returns just 1 slide → REPLACES entire deck!
Fix: Added _detect_add_intent() method and check in both send_message and send_message_streaming:
# In the "elif html_output and html_output.strip():" branch:
is_add_intent = self._detect_add_intent(message)
existing_deck = self._get_or_load_deck(session_id)
if is_add_intent and existing_deck and len(existing_deck.slides) > 0:
# APPEND new slides to existing deck instead of replacing
insert_position = len(existing_deck.slides)
for idx, slide in enumerate(new_deck.slides):
slide.slide_id = f"slide_{insert_position + idx}"
existing_deck.insert_slide(insert_position + idx, slide)
current_deck = existing_deck
else:
# Standard behavior: new generation
current_deck = new_deck
This ensures:
- "add a slide" with NO selection → appends to existing deck
- "add a slide" WITH selection → original RC2 fix handles it
- "create slides about X" (new topic) → replaces deck as expected
Test Cases:
| Test ID | Scenario | Expected Outcome |
|---|---|---|
| RC2-T1 | "add a slide at the bottom for summary" | Add intent detected, slides appended to deck |
| RC2-T2 | "insert a new slide after this one" | Add intent detected |
| RC2-T3 | "change the color to red" | Edit intent (not add), standard replacement |
| RC2-T4 | "make this slide blue" | Edit intent (not add), standard replacement |
| RC2-T5 | "create a new summary slide" | Add intent detected, slide appended |
| RC2-T6 | "add slide" with LLM returning 1 slide | Slide appended, originals preserved |
| RC2-T7 | 5 slides exist + "add slide" → 6 slides | New slide appended at end |
Phase 4: Canvas ID Uniqueness (RC4)
Goal: Prevent canvas ID collisions when editing slides with charts.
Changes:
src/services/agent.py- Add canvas ID deduplication:
import uuid
def _deduplicate_canvas_ids(self, html_content: str, scripts: str) -> tuple[str, str]:
"""Generate unique canvas IDs to prevent collisions.
Appends a short unique suffix to all canvas IDs in HTML and scripts.
Returns:
(updated_html, updated_scripts)
"""
soup = BeautifulSoup(html_content, "html.parser")
canvases = soup.find_all("canvas")
if not canvases:
return html_content, scripts
suffix = uuid.uuid4().hex[:6]
id_mapping = {}
# Update canvas IDs in HTML
for canvas in canvases:
old_id = canvas.get("id")
if old_id:
new_id = f"{old_id}_{suffix}"
id_mapping[old_id] = new_id
canvas["id"] = new_id
updated_html = str(soup)
updated_scripts = scripts
# Update references in scripts
for old_id, new_id in id_mapping.items():
# Update getElementById calls
updated_scripts = re.sub(
rf"getElementById\s*\(\s*['\"]({re.escape(old_id)})['\"]\s*\)",
f"getElementById('{new_id}')",
updated_scripts
)
# Update Canvas comments
updated_scripts = re.sub(
rf"//\s*Canvas:\s*{re.escape(old_id)}\b",
f"// Canvas: {new_id}",
updated_scripts,
flags=re.IGNORECASE
)
return updated_html, updated_scripts
src/services/agent.py- Apply in_parse_slide_replacements:
# After parsing slides, deduplicate canvas IDs
for slide in replacement_slides:
if '<canvas' in slide.html:
slide.html, slide.scripts = self._deduplicate_canvas_ids(slide.html, slide.scripts)
Test Cases:
| Test ID | Scenario | Expected Outcome |
|---|---|---|
| RC4-T1 | Edit slide with canvas id="chart1" | ID becomes "chart1_abc123" |
| RC4-T2 | Multiple canvases in one slide | All IDs get same suffix |
| RC4-T3 | Scripts reference old IDs | Scripts updated to new IDs |
| RC4-T4 | Slide without canvas | No changes made |
| RC4-T5 | Two consecutive edits | Each gets unique suffix |
Phase 5: JavaScript Syntax Validation (RC5)
Goal: Validate JavaScript syntax before applying to prevent corruption.
Changes:
- Add dependency:
esprimaorpy_mini_racerfor JS parsing
pip install esprima
src/utils/js_validator.py- New file:
"""JavaScript syntax validation utilities."""
import logging
from typing import Tuple
logger = logging.getLogger(__name__)
def validate_javascript(script: str) -> Tuple[bool, str]:
"""Validate JavaScript syntax using esprima.
Returns:
(is_valid, error_message)
"""
if not script or not script.strip():
return True, "" # Empty script is valid
try:
import esprima
esprima.parseScript(script, tolerant=True)
return True, ""
except esprima.Error as e:
return False, f"JavaScript syntax error: {e}"
except Exception as e:
logger.warning(f"JS validation failed with unexpected error: {e}")
# Be permissive on unexpected errors - don't block
return True, ""
def try_fix_common_js_errors(script: str) -> str:
"""Attempt to fix common JavaScript syntax errors.
Returns:
Fixed script (or original if no fixes applied)
"""
if not script:
return script
fixed = script
# Fix unclosed braces (simple heuristic)
open_braces = fixed.count('{')
close_braces = fixed.count('}')
if open_braces > close_braces:
fixed += '\n}' * (open_braces - close_braces)
# Fix unclosed parentheses
open_parens = fixed.count('(')
close_parens = fixed.count(')')
if open_parens > close_parens:
fixed += ')' * (open_parens - close_parens)
# Fix unclosed brackets
open_brackets = fixed.count('[')
close_brackets = fixed.count(']')
if open_brackets > close_brackets:
fixed += ']' * (open_brackets - close_brackets)
return fixed
def validate_and_fix_javascript(script: str) -> Tuple[str, bool, str]:
"""Validate JavaScript and attempt to fix if invalid.
Returns:
(fixed_script, was_fixed, error_message)
"""
if not script or not script.strip():
return script, False, ""
is_valid, error = validate_javascript(script)
if is_valid:
return script, False, ""
fixed_script = try_fix_common_js_errors(script)
is_valid_after_fix, _ = validate_javascript(fixed_script)
if is_valid_after_fix:
return fixed_script, True, ""
return script, False, error
src/services/agent.py- Use in_parse_slide_replacements:
from src.utils.js_validator import validate_javascript, try_fix_common_js_errors, validate_and_fix_javascript
# After extracting scripts for each slide
for slide in replacement_slides:
if slide.scripts:
is_valid, error = validate_javascript(slide.scripts)
if not is_valid:
logger.warning(f"Invalid JS in slide, attempting fix: {error}")
fixed_scripts = try_fix_common_js_errors(slide.scripts)
is_valid, error = validate_javascript(fixed_scripts)
if is_valid:
slide.scripts = fixed_scripts
logger.info("JS syntax fixed successfully")
else:
logger.error(f"Could not fix JS syntax: {error}")
# Option: clear invalid scripts to prevent browser errors
# slide.scripts = ""
Test Cases:
| Test ID | Scenario | Expected Outcome |
|---|---|---|
| RC5-T1 | Valid JavaScript | Passes validation |
| RC5-T2 | Missing closing brace } | Fixed automatically |
| RC5-T3 | Missing closing paren ) | Fixed automatically |
| RC5-T4 | Completely malformed JS | Warning logged, handled gracefully |
| RC5-T5 | Empty script | Passes validation |
| RC5-T6 | Script with try without catch | Detected as invalid |
Phase 6: Cache Restoration from Database (RC6)
Goal: Ensure deck is restored from database if in-memory cache is empty (e.g., after backend restart).
Problem Identified:
- Backend uses
--reloadflag in development, which restarts on file changes - In-memory
_deck_cacheis wiped on every restart - Code directly accessed cache with
_deck_cache.get(session_id)without database fallback - Users editing slides after a restart would lose all their work
Root Cause:
# BEFORE (buggy) - lines 234 and 469 in chat_service.py:
with self._cache_lock:
current_deck = self._deck_cache.get(session_id) # Returns None if cache empty!
Solution:
Use existing _get_or_load_deck() method which properly restores from database:
# AFTER (fixed):
current_deck = self._get_or_load_deck(session_id) # Checks cache, falls back to DB
Changes:
src/api/services/chat_service.py- Line 232-234 (sync method):
# BEFORE:
# Get cached deck for this session (thread-safe)
with self._cache_lock:
current_deck = self._deck_cache.get(session_id)
# AFTER:
# Get deck from cache or restore from database (RC6: survive backend restarts)
current_deck = self._get_or_load_deck(session_id)
src/api/services/chat_service.py- Line 466-468 (streaming method):
# Same change as above
current_deck = self._get_or_load_deck(session_id)
The _get_or_load_deck() method (already existed at line 680-700):
def _get_or_load_deck(self, session_id: str) -> Optional[SlideDeck]:
# Check cache first (with lock)
with self._cache_lock:
if session_id in self._deck_cache:
return self._deck_cache[session_id]
# Try to load from database (outside lock to avoid blocking)
session_manager = get_session_manager()
deck_data = session_manager.get_slide_deck(session_id)
if deck_data and deck_data.get("html_content"):
try:
deck = SlideDeck.from_html_string(deck_data["html_content"])
# Store in cache (with lock)
with self._cache_lock:
self._deck_cache[session_id] = deck
return deck
except Exception as e:
logger.warning(f"Failed to load deck from database: {e}")
return None
Test Cases:
| Test ID | Scenario | Expected Outcome |
|---|---|---|
| RC6-T1 | Edit with deck in cache | Deck returned from cache |
| RC6-T2 | Edit with empty cache, deck in DB | Deck restored from DB |
| RC6-T3 | Edit with empty cache, no deck in DB | Returns None gracefully |
| RC6-T4 | Backend restart mid-session | Deck restored, editing continues |
| RC6-T5 | Multiple restarts during editing | All edits preserved |
Production Impact:
- ✅ Development with
--reload: Safe - ✅ Production deployments: No data loss
- ✅ Backend crashes: Deck survives
- ✅ Memory-based restarts: Deck survives
Phase 7: Script Persistence on Database Restore (RC7)
Goal: Preserve individual slide scripts (charts) when loading deck from database.
Problem Identified:
- When deck is saved,
knit()aggregates all slide scripts into IIFE-wrapped blocks - When deck is loaded via
from_html_string(), the IIFE parsing fails to split scripts correctly - Individual slide scripts are lost, causing charts to disappear after backend restart
Root Cause:
# BEFORE (buggy) in _get_or_load_deck():
if deck_data and deck_data.get("html_content"):
deck = SlideDeck.from_html_string(deck_data["html_content"])
# ❌ from_html_string can't parse IIFE-wrapped scripts
# ❌ Individual slide.scripts lost, charts disappear
Solution:
Use the slides array from deck_dict (which preserves per-slide scripts) instead of parsing from raw HTML:
# AFTER (fixed):
def _get_or_load_deck(self, session_id: str) -> Optional[SlideDeck]:
deck_data = session_manager.get_slide_deck(session_id)
# Prefer reconstructing from slides array (preserves individual scripts)
if deck_data.get("slides"):
deck = self._reconstruct_deck_from_dict(deck_data)
elif deck_data.get("html_content"):
# Fallback: parse from raw HTML (may lose scripts due to IIFE parsing)
deck = SlideDeck.from_html_string(deck_data["html_content"])
def _reconstruct_deck_from_dict(self, deck_data: Dict[str, Any]) -> SlideDeck:
"""Reconstruct SlideDeck from stored dict (preserves individual slide scripts)."""
slides = []
for slide_data in deck_data.get("slides", []):
slide = Slide(
html=slide_data.get("html", ""),
slide_id=slide_data.get("slide_id", f"slide_{len(slides)}"),
scripts=slide_data.get("scripts", ""), # ✅ Individual scripts preserved
)
slides.append(slide)
deck = SlideDeck(
slides=slides,
css=deck_data.get("css", ""),
external_scripts=deck_data.get("external_scripts", []),
title=deck_data.get("title"),
)
return deck
Changes:
-
src/api/services/chat_service.py- Updated_get_or_load_deck():- Check for
slidesarray first - Use new
_reconstruct_deck_from_dict()to preserve scripts - Fallback to
from_html_string()only for legacy data
- Check for
-
src/api/services/chat_service.py- Added_reconstruct_deck_from_dict():- Reconstructs
SlideDeckfrom stored JSON dict - Preserves individual slide
scriptsproperty
- Reconstructs
Test Cases:
| Test ID | Scenario | Expected Outcome |
|---|---|---|
| RC7-T1 | Load deck with slides array | Scripts preserved on each slide |
| RC7-T2 | Load legacy deck (HTML only) | Falls back to HTML parsing |
| RC7-T3 | Backend restart with chart slides | Charts still render after restore |
| RC7-T4 | Add slide after restart | Existing charts preserved, new slide added |
Production Impact:
- ✅ Charts survive backend restarts
- ✅ Slide scripts properly associated with individual slides
- ✅ Backward compatible with legacy data (HTML fallback)
- ✅ No data loss on append/edit operations after cache miss
Phase 8: Clarification-First Approach (RC8, RC9, RC10, RC11)
Goal: Allow users to reference slides naturally, with clarification for ambiguous requests. Never fail silently.
Problem Identified:
- User says "edit slide 8 background to orange" without selecting
- No
slide_contextprovided → treated as new generation - LLM returns 1 slide → entire deck replaced → DATA LOSS
Solution: Clarification-First with Guards
Core Principle: Either source works (text reference OR panel selection). When ambiguous, always ask for clarification.
| Scenario | Has Selection? | Has Slide Ref in Text? | Action |
|---|---|---|---|
| "replace slide 3 chart with pie" | No | Yes ("slide 3") | ✅ Proceed - parse "slide 3" |
| "replace the chart with pie" | Yes (slide 3) | No | ✅ Proceed - use selection |
| "replace slide 3 chart with pie" | Yes (slide 2) | Yes ("slide 3") | ✅ Proceed - use selection (explicit action wins) |
| "replace the chart with pie" | No | No | ❓ Ask clarification |
| "change the background to blue" | No | No | ❓ Ask clarification |
Clarification Message:
"I'd like to help edit your slides. Could you please specify which slide? You can either:
- Say the slide number (e.g., 'change slide 3 background to blue')
- Or select the slide from the panel on the left"
User message
│
▼
┌─────────────────────────────┐
│ 1. CHECK SELECTION │
│ slide_context provided? │
│ YES → Use selection ✅ │
└─────────────┬───────────────┘
│ NO
▼
┌─────────────────────────────┐
│ 2. INTENT CLASSIFICATION │
│ - _detect_generation_intent│
│ - _detect_edit_intent │
│ - _detect_add_intent │
└─────────────┬───────────────┘
│
▼
┌─────────────────────────────┐
│ 3. PARSE SLIDE REFERENCES │
│ - "slide 8" → index 7 │
│ - "slides 2-4" → [1,2,3] │
│ - "after slide 3" → pos=4 │
└─────────────┬───────────────┘
│
▼
┌─────────────────────────────┐
│ 4. ROUTE DECISION │
│ │
│ generation? → New deck ✅ │
│ edit + ref? → Synthetic ctx ✅│
│ add + ref? → Insert at pos ✅│
│ add + no ref? → End of deck ✅│
│ edit + no ref? → ASK USER ⚠️│
│ ambiguous? → Preserve deck ⚠️│
└─────────────────────────────┘
RC11: Selection Wins Over Text Reference
When user selects slide 2 but writes "edit slide 3", the explicit action (selection) takes precedence:
- Selection is a deliberate UI action
- Text reference may be a typo or outdated
- Prevents confusion from conflicting instructions
Changes:
src/api/services/chat_service.py- Add_detect_generation_intent():
def _detect_generation_intent(self, message: str) -> bool:
"""Detect if user wants to generate NEW slides (replace deck)."""
generation_patterns = [
r"\bgenerate\b.*\bslides?\b",
r"\bcreate\b.*\b(presentation|slides?|deck)\b",
r"\bmake\s+me\b.*\bslides?\b",
r"\b\d+\s+slides?\s+(about|on|for)\b", # "5 slides about X"
r"\bnew\s+(presentation|deck|slides?)\b",
]
# Only these should replace entire deck
src/api/services/chat_service.py- Add_detect_edit_intent():
def _detect_edit_intent(self, message: str) -> bool:
"""Detect if user wants to edit existing slides."""
edit_patterns = [
r"\b(change|edit|modify|update|fix)\b.*\bslide\b",
r"\bslide\b.*\b(change|edit|modify|update|fix)\b",
r"\b(change|update)\b.*(color|background|title|text|chart)",
]
src/api/services/chat_service.py- Add_parse_slide_references():
def _parse_slide_references(self, message: str) -> tuple[list[int], Optional[str]]:
"""Parse slide numbers from message.
Returns:
(indices, position) - indices are 0-based, position is 'before'/'after' or None
Examples:
"slide 8" → ([7], None)
"slides 2-4" → ([1, 2, 3], None)
"after slide 3" → ([2], "after")
"before slide 5" → ([4], "before")
"""
patterns = [
(r"\bslide\s*#?(\d+)\b", None), # "slide 8"
(r"\b(\d+)(?:st|nd|rd|th)\s+slide", None), # "8th slide"
(r"\bafter\s+slide\s*#?(\d+)\b", "after"), # "after slide 3"
(r"\bbefore\s+slide\s*#?(\d+)\b", "before"),
(r"\bslides?\s*(\d+)\s*[-–to]+\s*(\d+)\b", None), # "slides 2-4"
]
src/api/services/chat_service.py- Update message routing:
# In send_message_streaming, BEFORE processing:
if not slide_context:
# No selection - classify intent
is_generation = self._detect_generation_intent(message)
is_edit = self._detect_edit_intent(message)
is_add = self._detect_add_intent(message)
slide_refs, position = self._parse_slide_references(message)
if is_generation:
# Allow deck replacement
pass
elif is_edit:
if slide_refs:
# Create synthetic slide_context
slide_context = self._create_synthetic_context(session_id, slide_refs)
else:
# GUARD: Ask for clarification
return self._return_clarification_needed(
"Which slide would you like to edit? Please specify (e.g., 'slide 3') or select it."
)
elif is_add and slide_refs:
# Use parsed position for insertion
# e.g., "add after slide 3" → insert at position 4
pass
Test Cases:
| Test ID | Scenario | Expected Outcome |
|---|---|---|
| RC8-T1 | "Edit slide 8 color" (no select) | Edit slide 8, deck preserved |
| RC8-T2 | "Change slide 3 title" (no select) | Edit slide 3, deck preserved |
| RC9-T1 | "Add after slide 5" (no select) | Insert at position 6 |
| RC9-T2 | "Add before slide 2" (no select) | Insert at position 1 |
| RC10-T1 | "Change the background" (no select, no ref) | Return clarification message |
| RC10-T2 | "Edit the chart" (no select, no ref) | Return clarification message |
| RC10-T3 | "Generate 5 slides about X" | New deck (allowed) |
| RC10-T4 | "Create 3 slides" (no existing deck) | New deck (generation intent) |
| RC11-T1 | Select slide 2 + "edit slide 3" | Edit slide 2 (selection wins) + note |
| RC11-T2 | Select slide 5 + "add after slide 2" | Add after slide 5 (selection wins) + note |
| RC12-T1 | "Create 5 slides about X" (existing deck) | Ask: Add or Replace? |
| RC12-T2 | "Generate slides about X" (existing deck) | Ask: Add or Replace? |
| RC12-T3 | "Add 3 slides about X" (existing deck) | Add slides (no clarification) |
| RC12-T4 | "Replace with new slides about X" | Replace deck (explicit intent) |
| RC12-T5 | "Start fresh with slides about X" | Replace deck (explicit intent) |
Guard Principles:
- Never replace a deck unless explicitly generating new slides. For any edit/modify operation without a clear target, ask for clarification.
- Never fail silently. Either proceed with confidence OR ask for clarification.
- Selection wins. When text reference and panel selection conflict, use the explicit action (selection).
Production Impact:
- ✅ Users can reference slides by number naturally
- ✅ No data loss from ambiguous requests
- ✅ Clear feedback when clarification needed
- ✅ Only explicit "generate" commands replace deck
- ✅ Selection always takes precedence over text reference
Phase 9: Selection vs Text Conflict Note (RC11)
Goal: Inform users when their selection differs from the slide number they mentioned.
Problem: User selects slide 2 but writes "edit slide 3" → which one gets edited? Confusion.
Solution: Selection wins (explicit action), but show a note explaining what happened.
Implementation:
# In send_message_streaming, before calling agent:
if slide_context:
text_refs, _ = self._parse_slide_references(message)
if text_refs:
selected_indices = slide_context.get("indices", [])
if set(text_refs) != set(selected_indices):
conflict_note = (
f"📝 Applied changes to **slide {selected_display}** (your selection). "
f"Note: you mentioned slide {text_display} in your message."
)
# After agent completes, yield the note before COMPLETE event
if conflict_note:
yield StreamEvent(type=StreamEventType.ASSISTANT, content=conflict_note)
User Experience:
- Changes are applied to selected slide (explicit action)
- User sees a brief note explaining what happened
- They can immediately redo if they meant the other slide
Phase 10: Generation Clarification (RC12)
Goal: Prevent accidental deck replacement when user says "create/generate slides" with existing deck.
Problem: User has 5 slides, says "create 3 slides about X" → Entire deck replaced! Data loss.
Solution: Ask for clarification: Add or Replace?
Implementation:
# In send_message_streaming early checks:
if is_generation and not is_add and not is_explicit_replace:
if existing_deck and len(existing_deck.slides) > 0:
clarification_msg = (
f"You have **{len(existing_deck.slides)} slides** in this session. "
"Would you like to:\n"
"- **Add** new slides to the existing deck?\n"
"- **Replace** the entire deck with a new presentation?"
)
yield StreamEvent(type=StreamEventType.ASSISTANT, content=clarification_msg)
return # Stop, wait for user response
# Explicit replace patterns that bypass clarification:
replace_patterns = [
r"\breplace\b.*\b(deck|slides?|presentation)\b",
r"\bstart\s+fresh\b",
r"\bstart\s+over\b",
r"\bnew\s+deck\b",
r"\bfrom\s+scratch\b",
]
User Experience:
- "Create 5 slides about X" (existing deck) → "You have 5 slides. Add or Replace?"
- "Add 3 slides about X" → Adds slides (no clarification needed)
- "Replace with slides about X" → Replaces (explicit intent, no clarification)
- "Start fresh with slides about X" → Replaces (explicit intent)
Known Performance Issue: Session History Loading (Pre-existing)
Problem: list_sessions() and get_session() use len(s.messages) which triggers N+1 queries.
Location: src/api/services/session_manager.py lines 119, 158
Impact: Slow session list loading (noticeable with many sessions/messages)
Fix (separate task):
# Use SQL COUNT instead of loading all messages
from sqlalchemy import func
message_count = db.query(func.count(SessionMessage.id)).filter(
SessionMessage.session_id == session.id
).scalar()
Note: This is a pre-existing issue, not introduced by our fixes.
4. Test Implementation
Test File: tests/unit/test_slide_editing_robustness.py
The test file imports only domain models and validator functions (not SlideGeneratorAgent or ChatService directly, except where needed for specific integration tests like RC6 cache restoration). The validate_and_fix_javascript combined function is also tested.
Full test file: tests/unit/test_slide_editing_robustness.py
Imports:
from src.domain.slide_deck import SlideDeck
from src.domain.slide import Slide
from src.utils.js_validator import (
validate_javascript,
try_fix_common_js_errors,
validate_and_fix_javascript,
)
Test classes and key tests:
TestDeckPreservation(RC3) - UsesSlideDeckandSlidedirectly to verify deck structure preservation, valid replacement updates, and new generation from HTML.TestLLMResponseValidation(RC1) - Creates aSlideGeneratorAgentvia mocked settings to test_validate_editing_response(). Covers conversational text detection, valid HTML acceptance, empty/whitespace responses, multiple slides, and conversational text with slide divs.TestAddIntentDetection(RC2) - Creates aSlideGeneratorAgentto test_detect_add_intent(). Covers add/insert/append/create patterns and verifies edit-like messages are not detected as add intent.TestCanvasIdDeduplication(RC4) - Creates aSlideGeneratorAgentto test_deduplicate_canvas_ids(). Covers single/multiple canvas deduplication, script reference updates (includingquerySelector), no-canvas passthrough, and consecutive edit uniqueness.TestJavaScriptValidation(RC5) - Testsvalidate_javascript,try_fix_common_js_errors, andvalidate_and_fix_javascriptdirectly. Covers valid JS, missing braces/parens/brackets, empty/whitespace scripts, and the combined validate-and-fix flow (with esprima fallback handling).TestSlideEditingIntegration- Tests deck creation from HTML, slide manipulation (add/remove), slides with scripts,knit()output, and_format_slide_contextwith add operation flag.TestCacheRestoration(RC6) - TestsChatService._get_or_load_deck()for cache hits, database fallback, and empty database handling.TestEdgeCases- Tests special characters, unicode, empty decks, slide cloning, and deeply nested HTML.
5. File Changes Summary
| File | Change Type | Description |
|---|---|---|
src/services/agent.py | Modify | RC1: validation & retry, RC2: add intent detection + is_add_operation flag, RC4: canvas deduplication, RC5: JS validation integration |
src/api/services/chat_service.py | Modify | RC3: deck preservation guard, RC6: cache restoration, RC8-RC13: intent detection & guards, RC13: auto-create slide_context from text reference |
src/utils/js_validator.py | New | RC5: JavaScript syntax validation utilities (validate_javascript, try_fix_common_js_errors, validate_and_fix_javascript) |
src/core/defaults.py | Modify | RC2: Clear EDIT/ADD/EXPAND operation instructions aligned with backend |
tests/unit/test_slide_editing_robustness.py | New | Comprehensive test suite (45 tests) |
requirements.txt | Modify | Add esprima dependency |
pyproject.toml | Modify | Add esprima>=4.0.0 dependency |
6. Rollout Plan
Step 1: Implement RC3 (Deck Preservation)
- Add guard in
chat_service.py - Run RC3-T1 through RC3-T5 tests
- Verify existing functionality not broken
Step 2: Implement RC1 (Response Validation)
- Add validation method in
agent.py - Add retry logic
- Run RC1-T1 through RC1-T6 tests
Step 3: Implement RC2 (Add Intent Detection)
- Add intent detection method
- Modify
_format_slide_context - Run RC2-T1 through RC2-T7 tests
Step 4: Implement RC4 (Canvas Deduplication)
- Add deduplication method
- Apply in
_parse_slide_replacements - Run RC4-T1 through RC4-T5 tests
Step 5: Implement RC5 (JS Validation)
- Create
js_validator.py - Add
esprimadependency - Integrate in agent
- Run RC5-T1 through RC5-T6 tests
Step 6: Integration Testing
- Run full test suite
- Manual testing of all scenarios
- Edge case verification
7. Success Criteria
| Criterion | Measurement |
|---|---|
| No deck loss on invalid LLM response | RC3 tests pass, manual verification |
| Invalid responses trigger retry | RC1 tests pass |
| Add intent properly detected | RC2 tests pass |
| No canvas ID collisions | RC4 tests pass |
| No JS syntax errors in browser | RC5 tests pass, manual verification |
| All existing tests still pass | pytest tests/ passes |
8. Code Quality Improvements
During the final review, several code quality issues were identified and fixed:
| Issue | Problem | Fix |
|---|---|---|
| Double Intent Detection | Regex patterns running twice per request (early + late) | Store detection results in variables (_is_edit, _is_generation, _is_add, _slide_refs) at start and reuse throughout |
| Dead Code | _create_synthetic_context method defined but never called | Removed (37 lines of dead code) |
| Overly Broad Pattern | r"\b\d+\s+slides?\b" caused false positives (e.g., "edit slide 5 slides look broken" matched as generation) | Removed pattern; more specific patterns are sufficient |
| Duplicate Imports | import re inside multiple methods | Moved to top of file |
| Misleading Comment | Comment said "insert at end" but code inserts at calculated position | Updated comment to match behavior |
| Unused Import | BeautifulSoup imported but not used | Removed |
Intent Detection Flow (Optimized)
# BEFORE: Detection called twice
def send_message_streaming(...):
if not slide_context:
is_edit = self._detect_edit_intent(message) # First call
...
# After LLM returns
is_edit = self._detect_edit_intent(message) # Second call (duplicate!)
# AFTER: Detection called once, results stored and reused
def send_message_streaming(...):
# Detect ONCE at start
_is_edit = self._detect_edit_intent(message)
_is_generation = self._detect_generation_intent(message)
_is_add = self._detect_add_intent(message)
_slide_refs, _ref_position = self._parse_slide_references(message)
if not slide_context:
if _is_edit and not _slide_refs: # Reuse stored result
# Ask clarification
# After LLM returns
if _is_edit and _slide_refs: # Reuse stored result
# Apply edit to referenced slides
RC13: Auto-Create Slide Context from Text Reference
Problem: When user says "edit slide 7" without selecting in the panel, the system detected the slide reference but didn't pass the slide's HTML to the LLM. The LLM would then ask "Can you provide the slide content?"
Solution: Before calling the LLM, if we detect an edit intent with a slide reference but no frontend selection, look up the slide from the deck and create slide_context automatically.
# RC13: Auto-create slide_context from text reference
if _is_edit and _slide_refs and not slide_context:
existing_deck = self._get_or_load_deck(session_id)
if existing_deck and len(existing_deck.slides) > 0:
valid_refs = [i for i in _slide_refs if 0 <= i < len(existing_deck.slides)]
if valid_refs:
# Look up actual slide HTML (already stored per-slide via RC7)
slide_htmls = [existing_deck.slides[i].html for i in valid_refs]
# Create context in same format as frontend selection
slide_context = {
"indices": valid_refs,
"slide_htmls": slide_htmls
}
Test Cases:
| Test ID | Scenario | Expected Outcome |
|---|---|---|
| RC13-T1 | "Change slide 7 background to grey" (no selection) | Slide 7 edited, LLM receives slide HTML |
| RC13-T2 | "Edit slides 2-4" (no selection) | All 3 slides edited, LLM receives all HTML |
| RC13-T3 | "Change slide 99 color" (out of range) | Falls through to other handling |
RC15: Optimize Script Preservation Fix
Problem: When user clicks "Optimize" on a slide with a chart:
- RC4 deduplication adds suffix to canvas ID:
mdpChart→mdpChart_abc0cb - Original script references
mdpChart - Script preservation looks for
mdpChart_abc0cb→ no match - If matched via base ID, script still references OLD canvas ID
- Multiple optimizes compound the problem:
mdpChart_abc0cb_730ba9_b4ca7a
Solution: Two-part fix in _apply_slide_replacements:
- Smart matching: Strip RC4 suffix to find matching scripts
- Update references: When preserving, update
getElementByIdcalls to use new canvas ID
# 1. Try exact match, then fallback to base ID
base_id = re.sub(r'_[a-f0-9]{6}$', '', canvas_id)
if base_id in canvas_id_to_script:
script_to_preserve = canvas_id_to_script[base_id]
old_canvas_id = base_id
# 2. Update canvas ID references in preserved script
if old_canvas_id != canvas_id:
script_to_preserve = re.sub(
rf"getElementById\s*\(\s*['\"]({re.escape(old_canvas_id)})['\"]\s*\)",
f"getElementById('{canvas_id}')",
script_to_preserve,
)
Safety: No behavior change for existing scenarios - only fixes optimize case.
RC14: Unsupported Operations Guidance
Problem: When user asks to delete, reorder, or duplicate slides via chat:
- Delete/reorder: LLM naturally gives conversational response ✅
- Duplicate: LLM tries to return HTML (empty slide) ❌
Solution: Added section 6 to slide_editing_instructions in defaults.py:
6. UNSUPPORTED OPERATIONS (respond conversationally, do NOT return HTML):
- DELETE/REMOVE: "Use the trash icon in the slide panel on the right"
- REORDER/MOVE: "Drag and drop in the slide panel on the right"
- DUPLICATE/COPY/CLONE: "Select the slide and ask 'create an exact copy'"
Design Decision: Keep duplicate simple - user selects slide, asks for exact copy. No special duplicate logic needed.
Safety: Even if LLM ignores these instructions, RC10 guard preserves the deck.
9. Cross-References
- Backend Overview - Agent and chat service architecture
- Slide Parser and Script Management - HTML parsing details
- Frontend Overview - UI handling of slides
- Save Points / Versioning - Complete deck state snapshots with preview and restore
10. Appendix: Test Commands
# Run all robustness tests
pytest tests/unit/test_slide_editing_robustness.py -v
# Run specific test class
pytest tests/unit/test_slide_editing_robustness.py::TestDeckPreservation -v
# Run with coverage
pytest tests/unit/test_slide_editing_robustness.py --cov=src --cov-report=html