Hook: Stop wasting weeks wiring recommendations — build micro-app recommenders that respect preferences, constraints and personalization
If you’re a developer or IT lead in 2026, you’ve felt the friction: long integration cycles, brittle conversational flows, and recommendation results that ignore a user’s hard constraints. Micro-apps powered by modern LLMs (ChatGPT, Claude and others) let teams ship internal recommender tools fast — but only if your prompt engineering and I/O schemas are solid.
Why this cookbook matters in 2026
Two trends accelerated in late 2025 and early 2026 that make this guide timely:
- Vibe-coding and micro-apps: Non-developers are shipping internal micro-apps for narrow tasks — from Where2Eat-style dining helpers to HR training recommenders — cutting decision latency and engineering overhead.
- LLM tooling advances: Anthropic’s Cowork and Claude Code expanded safe local/autonomous workflows, while ChatGPT and major LLMs improved structured-output controls and schema enforcement. That makes reliable recommendation micro-apps feasible in production.
“Once vibe-coding apps emerged, I started hearing about people with no tech backgrounds successfully building their own apps.” — on the micro-app trend
What you’ll get in this cookbook
- Practical prompt patterns for recommendation prompts and preference elicitation
- Proven input/output schemas for predictable JSON outputs and validation
- Step-by-step micro-app build: design, prompt, RAG, personalization, validation, metrics
- Examples for ChatGPT and Claude, plus deployment & security notes
Core concepts (quick primer)
Before diving into patterns, keep these concepts front-of-mind:
- Hard constraints: Must be satisfied (e.g., budget, location, compliance)
- Soft preferences: Ranked desires (e.g., likes spicy food, prefers videos)
- Context: Session signals, recent interactions, corporate policies
- Personalization: Profile + history + embeddings to bias results
- Deterministic schema: Force JSON output so downstream code can parse reliably
Recipe overview: Build a recommender micro-app in 8 steps
- Define the use case and success metrics
- Design the input (preference elicitation) schema
- Design the output schema (machine-parseable JSON)
- Compose layered prompts: system, context, examples, final instruction
- Integrate RAG (domain docs + embeddings) where needed
- Orchestrate: validation, fallback, rejection reasons
- Instrument metrics and monitoring
- Ship as a micro-app: web widget, Slack bot, or internal desktop agent
Step 1 — Define the use case + metrics
Pick a narrow domain: internal training courses, vendor selection for procurement, onboarding checklists, or team lunch suggestions. Define measurable KPIs such as:
- Conversion rate (accepted recommendation / suggestions shown)
- Time-to-decision (seconds saved)
- Constraint-violation rate (should be 0 for hard constraints)
- User satisfaction (thumbs up/down + free-text feedback)
Step 2 — Input schema: preference elicitation patterns
Design a minimal input schema that captures hard constraints, soft preferences, and contextual signals. Use short field names, typed values and enums for validation.
Example input schema (JSON)
{
"user_id": "string",
"session_id": "string",
"hard_constraints": {
"budget": {"currency": "GBP", "max": 50},
"region": "EMEA",
"compliance_tags": ["SOC2"]
},
"soft_preferences": [
{"key": "format", "value": "video", "weight": 0.6},
{"key": "topic", "value": "cloud security", "weight": 0.9}
],
"context": {
"recent_clicks": ["network-security-guide.pdf"],
"team": "platform",
"device": "desktop"
}
}Key design choices:
- Separate hard_constraints and soft_preferences so LLM logic can treat them differently.
- Use numeric weight to express importance; 0.0–1.0 scale is intuitive.
- Include a brief context snapshot to bias results.
Step 3 — Output schema: make LLM results machine-friendly
Always require a strict JSON output with a reason field and confidence estimate. This allows your micro-app to validate and explain decisions to users and auditors.
Canonical output schema
{
"recommendations": [
{
"id": "string",
"title": "string",
"score": 0.0,
"primary_reason": "string",
"explainability": {
"matched_hard_constraints": ["budget", "region"],
"matched_preferences": [{"key":"topic","value":"cloud security","weight":0.9}],
"fallbacks": ["no video available, suggested article instead"]
},
"metadata": {"duration_mins": 45, "format": "video", "provider": "LMS"}
}
],
"summary": "string",
"errors": [],
"model_version": "string",
"response_time_ms": 0
}Enforcing this schema lets you implement automated validation before showing results. If the LLM returns a violation (e.g., recommending an item above budget), your app can reject and re-prompt.
Step 4 — Prompt patterns: templates that work
Use layered prompts: system instruction for behavior, context for state, examples for format, then the request. Below are three patterns used across recommender micro-apps.
Pattern A — Preference-first recommendation (single-turn)
Best for small sets (10-50 items) where embeddings/RAG aren’t necessary.
System: You are a concise internal recommender. Always return EXACT JSON matching the schema: (...insert schema...).
Context: {user profile + inventory list}
User: Given the input, return up to 5 recommendations that satisfy ALL hard_constraints. Rank by combined score (constraints satisfied + preference weight). Include explainability entries. Do NOT include extra keys.Pattern B — Clarify-then-recommend (interactive)
Useful when preferences are sparse or ambiguous. The agent asks 1–2 clarifying questions then finalizes recommendations.
System: Behavior: if hard_constraints are clear, proceed. If preferences are missing or conflicting, ask only 1 clarifying question.
User: Input includes: {hard_constraints, soft_preferences}
Agent: If ok -> return JSON recommendations. Otherwise -> return JSON with {"clarify": true, "question": "Which format do you prefer: video or article?"}.Pattern C — RAG + Personalized bias
For domain-heavy domains — legal, procurement, technical docs. Combine a retrieval step that returns top-K docs with the prompt. Include user embedding similarity to bias ranking.
System: Use retrieved_docs (array) and user_profile_similarity (0-1) to score items. Prioritize items that cite retrieved_docs and match user similarity & preferences. Output strict JSON.Step 5 — Example prompts: ChatGPT vs Claude
Both models support structured responses; pick the one that fits your compliance and cost needs. Below are concise examples for each.
ChatGPT style (system + user call, enforcing JSON via function-calling or response format)
System: You are "ReccoBot" for internal training. REQUIRED_OUTPUT_SCHEMA: {...insert canonical output schema...}
User: Input: {...insert input JSON...}
User: Return EXACT JSON. If constraints can't be satisfied, include errors array with reasons.Anthropic Claude style (explicit JSON format enforcement)
Instruction: Provide recommendations as JSON matching the schema below. Use only the fields listed. Don't add commentary.
Context: (retrieved docs)
Data: {...input JSON...}
Response format: JSON only.Tip: add a final line like "If you cannot meet constraints, return errors array and no recommendations" — this avoids hallucinated results.
Step 6 — Orchestration: validation, fallbacks and re-prompting
Micro-app reliability comes from an orchestrator that validates the model output, handles violations, and decides when to re-prompt or escalate to humans. Consider developer productivity and cost tradeoffs outlined in developer productivity reports.
- Validate JSON schema server-side; reject any unexpected types.
- If a recommended item violates hard constraints, call the model again with an explicit negation instruction: "Do not suggest items over £X."
- Use a fallback ranking (deterministic filter) as a safety net when model confidence is low.
- Store provenance: model_version, prompt_hash, retrieved_doc_ids, and response_time for audits.
Step 7 — Personalization strategies
Personalization must balance relevance, privacy and cost. Consider identity risk and data handling best-practices described in identity risk guidance.
- Short-term session memory: Keep session-level interactions available to the prompt (last 3 interactions).
- Long-term profile vectors: Store user embeddings for preference vectors; include similarity score in the prompt to bias results.
- Content-level signals: Use document embeddings and RAG to ground recommendations in up-to-date internal assets.
- Decay & exploration: Add an exploration weight so you occasionally recommend new items for discovery.
Sample personalization input fragment
{
"user_vector_similarity": 0.83,
"recently_accepted_topics": ["kubernetes","observability"],
"last_accept_time": "2026-01-10T10:32:00Z"
}Step 8 — Metrics, monitoring and A/B
Instrument at three levels:
- Model-level: latency, token usage, failure rate, confidence score distribution.
- Business-level: accept rate, time-to-decision, task completion uplift.
- Quality-level: constraint-violation rate, hallucination incidents, manual override count.
Run A/B tests with different prompt templates, weighting functions, and RAG context sizes to measure cost vs performance. In 2026 many teams found a sweet spot by combining a small (top-3) retrieved docs + a lightweight personalization vector to reduce token costs while preserving relevance.
Real-world example: Internal Training Course Recommender (end-to-end)
Use case: Recommend internal training (video/article) to an engineer with budget constraints and time availability.
Input example
{
"user_id":"u123",
"hard_constraints": {"max_time_mins": 60, "budget": {"max": 30}},
"soft_preferences": [{"key":"format","value":"video","weight":0.8},{"key":"level","value":"intermediate","weight":0.7}],
"context": {"recent_views":["intro-to-mesh"], "team":"platform"}
}Prompt (pattern B — clarify then recommend)
System: You MUST return JSON matching schema. If multiple formats, prefer format with highest weight. Ask at most 1 clarifying question.
User: Given input, if max_time_mins < 30 and preference format=video, ask: "Short videos under 30 mins or articles ok?" Otherwise return recommendations.Expected output (truncated)
{
"recommendations": [
{"id":"c789","title":"Observability Patterns (Video)","score":0.92,
"primary_reason":"Matches video preference, duration 45m < max_time 60m","explainability":{...},"metadata":{"duration_mins":45}}
],
"summary":"1 recommended course fits constraints",
"errors":[]
}
When the LLM returns valid JSON with explainability, the micro-app shows the recommendation with a CTA and stores telemetry.
Advanced strategies & anti-patterns
Strategies
- Prompt ensembles: Run two short prompts — one optimizing for hard constraints, one for serendipity — then merge results via deterministic logic.
- Constraint-first filters: Pre-filter candidate pool programmatically before prompting to reduce model errors and token cost.
- Context compression: Use retrieval + summarization to include only the most relevant doc snippets in the prompt.
Anti-patterns
- Relying on free-text responses only — hard to validate and brittle.
- Feeding full corpora into the prompt rather than using RAG and embeddings.
- Expecting the model to enforce complex business rules without programmatic validation.
Security, compliance and cost controls
Internal micro-apps often surface sensitive signals. Follow these best practices:
- PII minimization: Strip or pseudonymize user identifiers before sending to external LLMs — a key step to reduce identity risk.
- On-prem or private endpoints: Use on-prem LLMs or private Claude/ChatGPT enterprise endpoints when dealing with regulated data; consult indexing & edge manuals for best practices.
- Rate & token limits: Enforce per-user quotas and caching of repeated prompts to control costs.
- Audit logs: Persist prompt_hash, model_version, retrieved_doc_ids and response JSON for audits and model debugging.
Testing prompts: unit tests and golden outputs
Treat prompts like code. Build a test-suite with golden inputs and expected JSON outputs. Run nightly checks to detect model drift — for example, when a model replaces numeric types with strings or breaks the schema. Pair tests with developer productivity tooling and CI signals described in developer productivity reports.
Deployment patterns for micro-apps
- Slack/Teams quick-action: Thin orchestrator service + modal that collects inputs, calls LLM, validates JSON, returns result card.
- Web widget: React component that gathers preferences, calls backend orchestrator for LLM calls and validation.
- Desktop agent: Local agent (e.g., Anthropic Cowork-style) with file-system access for advanced RAG on internal docs — requires strict ACLs.
Case study: 2-week MVP at an enterprise (summary)
Team: Internal tools + 1 ML engineer. Goal: recommend vendor options for small purchases. Approach:
- Defined hard constraints (max spend, approved vendors) and soft preferences (lead time, sustainability score).
- Pre-filtered vendor DB, used top-5 vendor rows as candidates, passed them and user profile to Claude with explicit JSON schema.
- Built validation layer; constraint-violation rate dropped to 0.5% after two prompt iterations.
Outcome: Decision latency reduced from 3 days to 4 hours (user research + approvals), adoption by procurement pilot team 42% month one.
Future predictions (2026+)
- Micro-app marketplaces: Internal micro-app registries where business users share vetted recommenders and prompt templates.
- Stronger schema enforcement: LLMs will adopt native JSON-schema bindings reducing the need for repeated re-prompts.
- Smarter local agents: Tools like Cowork will enable desktop agents that combine local files with cloud LLMs for richer RAG while keeping data private. See indexing & edge guidance: Indexing Manuals for the Edge Era.
Checklist: Launch-ready recommender micro-app
- Defined KPIs and user flows
- Input and output JSON schemas implemented and validated
- Prompt templates covering common cases and clarifications
- RAG pipeline and personalization vectors configured
- Orchestrator with schema validation and fallback logic
- Telemetry for model & business metrics
- Security, PII handling and audit logging in place
Quick prompt templates (copy-paste)
Minimal single-turn (ChatGPT)
System: You are an internal recommender. Output valid JSON matching THIS SCHEMA: {...schema...}. Do NOT include text outside JSON.
User: Input: {...input JSON...}.
User: Return up to 5 recommendations that satisfy all hard_constraints. Rank by score and include explainability.Interactive clarifier
System: If preferences are missing or ambiguous, ask 1 clarifying question in JSON: {"clarify": true, "question":"..."}. Otherwise return recommendations JSON.
User: Input: {...}Closing: Actionable next steps
Start small: pick one narrow domain and build a two-screen micro-app. Use the input/output schemas in this cookbook and run 50–100 test cases. Measure constraint-violation rate and iterate the prompt until it’s under 1%.
Want templates, schema files and working prompt bundles? We maintain a library of production-ready prompt templates, JSON schemas and orchestration samples tailored for ChatGPT and Claude that you can fork and customize.
Call to action
Get the micro-app prompt template pack from bot365.co.uk or contact our team for a free 2-week pilot to convert one internal workflow into a production recommender micro-app — faster than you think. Ship smarter recommendations with fewer engineering hours.
Related Reading
- From Micro-App to Production: CI/CD & Governance for LLM-built Tools
- Indexing Manuals for the Edge Era (RAG & Edge guidance)
- Developer Productivity and Cost Signals in 2026
- Observability in 2026: Subscription Health & Auditing
- Dinner-Ready Lighting Scenes: 5 Presets to Switch the Mood in Seconds
- Budget Picks for Teen Gamers and Collectors: Pokémon ETBs, Magic TMNT Boxes and Why Price Drops Matter
- From Notebooks to Necklaces: How Scarcity and Celebrity Endorsement Create Must-Have Jewelry
- Collector Alert: Fallout Secret Lair Superdrop — What to Buy, What to Flip
- Build Custom LEGO Accessories with a Budget 3D Printer: Best Models and Printers Under $300