promptsrecommendationinternal tools

Prompt Engineering for Internal Recommendation Micro-Apps: A Cookbook

UUnknown

2026-02-08

11 min read

Practical prompt patterns and JSON schemas to build reliable internal recommender micro-apps that handle preferences, constraints and personalization.

Hook: Stop wasting weeks wiring recommendations — build micro-app recommenders that respect preferences, constraints and personalization

If you’re a developer or IT lead in 2026, you’ve felt the friction: long integration cycles, brittle conversational flows, and recommendation results that ignore a user’s hard constraints. Micro-apps powered by modern LLMs (ChatGPT, Claude and others) let teams ship internal recommender tools fast — but only if your prompt engineering and I/O schemas are solid.

Why this cookbook matters in 2026

Two trends accelerated in late 2025 and early 2026 that make this guide timely:

Vibe-coding and micro-apps: Non-developers are shipping internal micro-apps for narrow tasks — from Where2Eat-style dining helpers to HR training recommenders — cutting decision latency and engineering overhead.
LLM tooling advances: Anthropic’s Cowork and Claude Code expanded safe local/autonomous workflows, while ChatGPT and major LLMs improved structured-output controls and schema enforcement. That makes reliable recommendation micro-apps feasible in production.

“Once vibe-coding apps emerged, I started hearing about people with no tech backgrounds successfully building their own apps.” — on the micro-app trend

What you’ll get in this cookbook

Practical prompt patterns for recommendation prompts and preference elicitation
Proven input/output schemas for predictable JSON outputs and validation
Step-by-step micro-app build: design, prompt, RAG, personalization, validation, metrics
Examples for ChatGPT and Claude, plus deployment & security notes

Core concepts (quick primer)

Before diving into patterns, keep these concepts front-of-mind:

Hard constraints: Must be satisfied (e.g., budget, location, compliance)
Soft preferences: Ranked desires (e.g., likes spicy food, prefers videos)
Context: Session signals, recent interactions, corporate policies
Personalization: Profile + history + embeddings to bias results
Deterministic schema: Force JSON output so downstream code can parse reliably

Recipe overview: Build a recommender micro-app in 8 steps

Define the use case and success metrics
Design the input (preference elicitation) schema
Design the output schema (machine-parseable JSON)
Compose layered prompts: system, context, examples, final instruction
Integrate RAG (domain docs + embeddings) where needed
Orchestrate: validation, fallback, rejection reasons
Instrument metrics and monitoring
Ship as a micro-app: web widget, Slack bot, or internal desktop agent

Step 1 — Define the use case + metrics

Pick a narrow domain: internal training courses, vendor selection for procurement, onboarding checklists, or team lunch suggestions. Define measurable KPIs such as:

Conversion rate (accepted recommendation / suggestions shown)
Time-to-decision (seconds saved)
Constraint-violation rate (should be 0 for hard constraints)
User satisfaction (thumbs up/down + free-text feedback)

Step 2 — Input schema: preference elicitation patterns

Design a minimal input schema that captures hard constraints, soft preferences, and contextual signals. Use short field names, typed values and enums for validation.

Example input schema (JSON)

{
  "user_id": "string",
  "session_id": "string",
  "hard_constraints": {
    "budget": {"currency": "GBP", "max": 50},
    "region": "EMEA",
    "compliance_tags": ["SOC2"]
  },
  "soft_preferences": [
    {"key": "format", "value": "video", "weight": 0.6},
    {"key": "topic", "value": "cloud security", "weight": 0.9}
  ],
  "context": {
    "recent_clicks": ["network-security-guide.pdf"],
    "team": "platform",
    "device": "desktop"
  }
}

Key design choices:

Separate hard_constraints and soft_preferences so LLM logic can treat them differently.
Use numeric weight to express importance; 0.0–1.0 scale is intuitive.
Include a brief context snapshot to bias results.

Step 3 — Output schema: make LLM results machine-friendly

Always require a strict JSON output with a reason field and confidence estimate. This allows your micro-app to validate and explain decisions to users and auditors.

Canonical output schema

{
  "recommendations": [
    {
      "id": "string",
      "title": "string",
      "score": 0.0,
      "primary_reason": "string",
      "explainability": {
        "matched_hard_constraints": ["budget", "region"],
        "matched_preferences": [{"key":"topic","value":"cloud security","weight":0.9}],
        "fallbacks": ["no video available, suggested article instead"]
      },
      "metadata": {"duration_mins": 45, "format": "video", "provider": "LMS"}
    }
  ],
  "summary": "string",
  "errors": [],
  "model_version": "string",
  "response_time_ms": 0
}

Enforcing this schema lets you implement automated validation before showing results. If the LLM returns a violation (e.g., recommending an item above budget), your app can reject and re-prompt.

Step 4 — Prompt patterns: templates that work

Use layered prompts: system instruction for behavior, context for state, examples for format, then the request. Below are three patterns used across recommender micro-apps.

Pattern A — Preference-first recommendation (single-turn)

Best for small sets (10-50 items) where embeddings/RAG aren’t necessary.

System: You are a concise internal recommender. Always return EXACT JSON matching the schema: (...insert schema...).
Context: {user profile + inventory list}
User: Given the input, return up to 5 recommendations that satisfy ALL hard_constraints. Rank by combined score (constraints satisfied + preference weight). Include explainability entries. Do NOT include extra keys.

Useful when preferences are sparse or ambiguous. The agent asks 1–2 clarifying questions then finalizes recommendations.

System: Behavior: if hard_constraints are clear, proceed. If preferences are missing or conflicting, ask only 1 clarifying question.
User: Input includes: {hard_constraints, soft_preferences}
Agent: If ok -> return JSON recommendations. Otherwise -> return JSON with {"clarify": true, "question": "Which format do you prefer: video or article?"}.

Pattern C — RAG + Personalized bias

For domain-heavy domains — legal, procurement, technical docs. Combine a retrieval step that returns top-K docs with the prompt. Include user embedding similarity to bias ranking.

System: Use retrieved_docs (array) and user_profile_similarity (0-1) to score items. Prioritize items that cite retrieved_docs and match user similarity & preferences. Output strict JSON.

Step 5 — Example prompts: ChatGPT vs Claude

Both models support structured responses; pick the one that fits your compliance and cost needs. Below are concise examples for each.

ChatGPT style (system + user call, enforcing JSON via function-calling or response format)

System: You are "ReccoBot" for internal training. REQUIRED_OUTPUT_SCHEMA: {...insert canonical output schema...}
User: Input: {...insert input JSON...}
User: Return EXACT JSON. If constraints can't be satisfied, include errors array with reasons.

Anthropic Claude style (explicit JSON format enforcement)

Instruction: Provide recommendations as JSON matching the schema below. Use only the fields listed. Don't add commentary.
Context: (retrieved docs)
Data: {...input JSON...}
Response format: JSON only.

Tip: add a final line like "If you cannot meet constraints, return errors array and no recommendations" — this avoids hallucinated results.

Step 6 — Orchestration: validation, fallbacks and re-prompting

Micro-app reliability comes from an orchestrator that validates the model output, handles violations, and decides when to re-prompt or escalate to humans. Consider developer productivity and cost tradeoffs outlined in developer productivity reports.

Validate JSON schema server-side; reject any unexpected types.
If a recommended item violates hard constraints, call the model again with an explicit negation instruction: "Do not suggest items over £X."
Use a fallback ranking (deterministic filter) as a safety net when model confidence is low.
Store provenance: model_version, prompt_hash, retrieved_doc_ids, and response_time for audits.

Step 7 — Personalization strategies

Personalization must balance relevance, privacy and cost. Consider identity risk and data handling best-practices described in identity risk guidance.

Short-term session memory: Keep session-level interactions available to the prompt (last 3 interactions).
Long-term profile vectors: Store user embeddings for preference vectors; include similarity score in the prompt to bias results.
Content-level signals: Use document embeddings and RAG to ground recommendations in up-to-date internal assets.
Decay & exploration: Add an exploration weight so you occasionally recommend new items for discovery.

Sample personalization input fragment

{
  "user_vector_similarity": 0.83,
  "recently_accepted_topics": ["kubernetes","observability"],
  "last_accept_time": "2026-01-10T10:32:00Z"
}

Step 8 — Metrics, monitoring and A/B

Instrument at three levels:

Model-level: latency, token usage, failure rate, confidence score distribution.
Business-level: accept rate, time-to-decision, task completion uplift.
Quality-level: constraint-violation rate, hallucination incidents, manual override count.

Run A/B tests with different prompt templates, weighting functions, and RAG context sizes to measure cost vs performance. In 2026 many teams found a sweet spot by combining a small (top-3) retrieved docs + a lightweight personalization vector to reduce token costs while preserving relevance.

Real-world example: Internal Training Course Recommender (end-to-end)

Use case: Recommend internal training (video/article) to an engineer with budget constraints and time availability.

Input example

{
  "user_id":"u123",
  "hard_constraints": {"max_time_mins": 60, "budget": {"max": 30}},
  "soft_preferences": [{"key":"format","value":"video","weight":0.8},{"key":"level","value":"intermediate","weight":0.7}],
  "context": {"recent_views":["intro-to-mesh"], "team":"platform"}
}

System: You MUST return JSON matching schema. If multiple formats, prefer format with highest weight. Ask at most 1 clarifying question.
User: Given input, if max_time_mins < 30 and preference format=video, ask: "Short videos under 30 mins or articles ok?" Otherwise return recommendations.

Expected output (truncated)

{
  "recommendations": [
    {"id":"c789","title":"Observability Patterns (Video)","score":0.92,
     "primary_reason":"Matches video preference, duration 45m < max_time 60m","explainability":{...},"metadata":{"duration_mins":45}}
  ],
  "summary":"1 recommended course fits constraints",
  "errors":[]
}

When the LLM returns valid JSON with explainability, the micro-app shows the recommendation with a CTA and stores telemetry.

Advanced strategies & anti-patterns

Strategies

Prompt ensembles: Run two short prompts — one optimizing for hard constraints, one for serendipity — then merge results via deterministic logic.
Constraint-first filters: Pre-filter candidate pool programmatically before prompting to reduce model errors and token cost.
Context compression: Use retrieval + summarization to include only the most relevant doc snippets in the prompt.

Anti-patterns

Relying on free-text responses only — hard to validate and brittle.
Feeding full corpora into the prompt rather than using RAG and embeddings.
Expecting the model to enforce complex business rules without programmatic validation.

Security, compliance and cost controls

Internal micro-apps often surface sensitive signals. Follow these best practices:

PII minimization: Strip or pseudonymize user identifiers before sending to external LLMs — a key step to reduce identity risk.
On-prem or private endpoints: Use on-prem LLMs or private Claude/ChatGPT enterprise endpoints when dealing with regulated data; consult indexing & edge manuals for best practices.
Rate & token limits: Enforce per-user quotas and caching of repeated prompts to control costs.
Audit logs: Persist prompt_hash, model_version, retrieved_doc_ids and response JSON for audits and model debugging.

Testing prompts: unit tests and golden outputs

Treat prompts like code. Build a test-suite with golden inputs and expected JSON outputs. Run nightly checks to detect model drift — for example, when a model replaces numeric types with strings or breaks the schema. Pair tests with developer productivity tooling and CI signals described in developer productivity reports.

Deployment patterns for micro-apps

Slack/Teams quick-action: Thin orchestrator service + modal that collects inputs, calls LLM, validates JSON, returns result card.
Web widget: React component that gathers preferences, calls backend orchestrator for LLM calls and validation.
Desktop agent: Local agent (e.g., Anthropic Cowork-style) with file-system access for advanced RAG on internal docs — requires strict ACLs.

Case study: 2-week MVP at an enterprise (summary)

Team: Internal tools + 1 ML engineer. Goal: recommend vendor options for small purchases. Approach:

Defined hard constraints (max spend, approved vendors) and soft preferences (lead time, sustainability score).
Pre-filtered vendor DB, used top-5 vendor rows as candidates, passed them and user profile to Claude with explicit JSON schema.
Built validation layer; constraint-violation rate dropped to 0.5% after two prompt iterations.

Outcome: Decision latency reduced from 3 days to 4 hours (user research + approvals), adoption by procurement pilot team 42% month one.

Future predictions (2026+)

Micro-app marketplaces: Internal micro-app registries where business users share vetted recommenders and prompt templates.
Stronger schema enforcement: LLMs will adopt native JSON-schema bindings reducing the need for repeated re-prompts.
Smarter local agents: Tools like Cowork will enable desktop agents that combine local files with cloud LLMs for richer RAG while keeping data private. See indexing & edge guidance: Indexing Manuals for the Edge Era.

Checklist: Launch-ready recommender micro-app

Defined KPIs and user flows
Input and output JSON schemas implemented and validated
Prompt templates covering common cases and clarifications
RAG pipeline and personalization vectors configured
Orchestrator with schema validation and fallback logic
Telemetry for model & business metrics
Security, PII handling and audit logging in place

Quick prompt templates (copy-paste)

Minimal single-turn (ChatGPT)

System: You are an internal recommender. Output valid JSON matching THIS SCHEMA: {...schema...}. Do NOT include text outside JSON.
User: Input: {...input JSON...}.
User: Return up to 5 recommendations that satisfy all hard_constraints. Rank by score and include explainability.

Interactive clarifier

System: If preferences are missing or ambiguous, ask 1 clarifying question in JSON: {"clarify": true, "question":"..."}. Otherwise return recommendations JSON.
User: Input: {...}

Closing: Actionable next steps

Start small: pick one narrow domain and build a two-screen micro-app. Use the input/output schemas in this cookbook and run 50–100 test cases. Measure constraint-violation rate and iterate the prompt until it’s under 1%.

Want templates, schema files and working prompt bundles? We maintain a library of production-ready prompt templates, JSON schemas and orchestration samples tailored for ChatGPT and Claude that you can fork and customize.

Call to action

Get the micro-app prompt template pack from bot365.co.uk or contact our team for a free 2-week pilot to convert one internal workflow into a production recommender micro-app — faster than you think. Ship smarter recommendations with fewer engineering hours.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.