Prompt Engineering for Internal Recommendation Micro-Apps: A Cookbook
promptsrecommendationinternal tools

Prompt Engineering for Internal Recommendation Micro-Apps: A Cookbook

bbot365
2026-02-08 12:00:00
11 min read
Advertisement

Practical prompt patterns and JSON schemas to build reliable internal recommender micro-apps that handle preferences, constraints and personalization.

Hook: Stop wasting weeks wiring recommendations — build micro-app recommenders that respect preferences, constraints and personalization

If you’re a developer or IT lead in 2026, you’ve felt the friction: long integration cycles, brittle conversational flows, and recommendation results that ignore a user’s hard constraints. Micro-apps powered by modern LLMs (ChatGPT, Claude and others) let teams ship internal recommender tools fast — but only if your prompt engineering and I/O schemas are solid.

Why this cookbook matters in 2026

Two trends accelerated in late 2025 and early 2026 that make this guide timely:

  • Vibe-coding and micro-apps: Non-developers are shipping internal micro-apps for narrow tasks — from Where2Eat-style dining helpers to HR training recommenders — cutting decision latency and engineering overhead.
  • LLM tooling advances: Anthropic’s Cowork and Claude Code expanded safe local/autonomous workflows, while ChatGPT and major LLMs improved structured-output controls and schema enforcement. That makes reliable recommendation micro-apps feasible in production.
“Once vibe-coding apps emerged, I started hearing about people with no tech backgrounds successfully building their own apps.” — on the micro-app trend

What you’ll get in this cookbook

  • Practical prompt patterns for recommendation prompts and preference elicitation
  • Proven input/output schemas for predictable JSON outputs and validation
  • Step-by-step micro-app build: design, prompt, RAG, personalization, validation, metrics
  • Examples for ChatGPT and Claude, plus deployment & security notes

Core concepts (quick primer)

Before diving into patterns, keep these concepts front-of-mind:

  • Hard constraints: Must be satisfied (e.g., budget, location, compliance)
  • Soft preferences: Ranked desires (e.g., likes spicy food, prefers videos)
  • Context: Session signals, recent interactions, corporate policies
  • Personalization: Profile + history + embeddings to bias results
  • Deterministic schema: Force JSON output so downstream code can parse reliably

Recipe overview: Build a recommender micro-app in 8 steps

  1. Define the use case and success metrics
  2. Design the input (preference elicitation) schema
  3. Design the output schema (machine-parseable JSON)
  4. Compose layered prompts: system, context, examples, final instruction
  5. Integrate RAG (domain docs + embeddings) where needed
  6. Orchestrate: validation, fallback, rejection reasons
  7. Instrument metrics and monitoring
  8. Ship as a micro-app: web widget, Slack bot, or internal desktop agent

Step 1 — Define the use case + metrics

Pick a narrow domain: internal training courses, vendor selection for procurement, onboarding checklists, or team lunch suggestions. Define measurable KPIs such as:

  • Conversion rate (accepted recommendation / suggestions shown)
  • Time-to-decision (seconds saved)
  • Constraint-violation rate (should be 0 for hard constraints)
  • User satisfaction (thumbs up/down + free-text feedback)

Step 2 — Input schema: preference elicitation patterns

Design a minimal input schema that captures hard constraints, soft preferences, and contextual signals. Use short field names, typed values and enums for validation.

Example input schema (JSON)

{
  "user_id": "string",
  "session_id": "string",
  "hard_constraints": {
    "budget": {"currency": "GBP", "max": 50},
    "region": "EMEA",
    "compliance_tags": ["SOC2"]
  },
  "soft_preferences": [
    {"key": "format", "value": "video", "weight": 0.6},
    {"key": "topic", "value": "cloud security", "weight": 0.9}
  ],
  "context": {
    "recent_clicks": ["network-security-guide.pdf"],
    "team": "platform",
    "device": "desktop"
  }
}

Key design choices:

  • Separate hard_constraints and soft_preferences so LLM logic can treat them differently.
  • Use numeric weight to express importance; 0.0–1.0 scale is intuitive.
  • Include a brief context snapshot to bias results.

Step 3 — Output schema: make LLM results machine-friendly

Always require a strict JSON output with a reason field and confidence estimate. This allows your micro-app to validate and explain decisions to users and auditors.

Canonical output schema

{
  "recommendations": [
    {
      "id": "string",
      "title": "string",
      "score": 0.0,
      "primary_reason": "string",
      "explainability": {
        "matched_hard_constraints": ["budget", "region"],
        "matched_preferences": [{"key":"topic","value":"cloud security","weight":0.9}],
        "fallbacks": ["no video available, suggested article instead"]
      },
      "metadata": {"duration_mins": 45, "format": "video", "provider": "LMS"}
    }
  ],
  "summary": "string",
  "errors": [],
  "model_version": "string",
  "response_time_ms": 0
}

Enforcing this schema lets you implement automated validation before showing results. If the LLM returns a violation (e.g., recommending an item above budget), your app can reject and re-prompt.

Step 4 — Prompt patterns: templates that work

Use layered prompts: system instruction for behavior, context for state, examples for format, then the request. Below are three patterns used across recommender micro-apps.

Pattern A — Preference-first recommendation (single-turn)

Best for small sets (10-50 items) where embeddings/RAG aren’t necessary.

System: You are a concise internal recommender. Always return EXACT JSON matching the schema: (...insert schema...).
Context: {user profile + inventory list}
User: Given the input, return up to 5 recommendations that satisfy ALL hard_constraints. Rank by combined score (constraints satisfied + preference weight). Include explainability entries. Do NOT include extra keys.

Pattern B — Clarify-then-recommend (interactive)

Useful when preferences are sparse or ambiguous. The agent asks 1–2 clarifying questions then finalizes recommendations.

System: Behavior: if hard_constraints are clear, proceed. If preferences are missing or conflicting, ask only 1 clarifying question.
User: Input includes: {hard_constraints, soft_preferences}
Agent: If ok -> return JSON recommendations. Otherwise -> return JSON with {"clarify": true, "question": "Which format do you prefer: video or article?"}.

Pattern C — RAG + Personalized bias

For domain-heavy domains — legal, procurement, technical docs. Combine a retrieval step that returns top-K docs with the prompt. Include user embedding similarity to bias ranking.

System: Use retrieved_docs (array) and user_profile_similarity (0-1) to score items. Prioritize items that cite retrieved_docs and match user similarity & preferences. Output strict JSON.

Step 5 — Example prompts: ChatGPT vs Claude

Both models support structured responses; pick the one that fits your compliance and cost needs. Below are concise examples for each.

ChatGPT style (system + user call, enforcing JSON via function-calling or response format)

System: You are "ReccoBot" for internal training. REQUIRED_OUTPUT_SCHEMA: {...insert canonical output schema...}
User: Input: {...insert input JSON...}
User: Return EXACT JSON. If constraints can't be satisfied, include errors array with reasons.

Anthropic Claude style (explicit JSON format enforcement)

Instruction: Provide recommendations as JSON matching the schema below. Use only the fields listed. Don't add commentary.
Context: (retrieved docs)
Data: {...input JSON...}
Response format: JSON only.

Tip: add a final line like "If you cannot meet constraints, return errors array and no recommendations" — this avoids hallucinated results.

Step 6 — Orchestration: validation, fallbacks and re-prompting

Micro-app reliability comes from an orchestrator that validates the model output, handles violations, and decides when to re-prompt or escalate to humans. Consider developer productivity and cost tradeoffs outlined in developer productivity reports.

  • Validate JSON schema server-side; reject any unexpected types.
  • If a recommended item violates hard constraints, call the model again with an explicit negation instruction: "Do not suggest items over £X."
  • Use a fallback ranking (deterministic filter) as a safety net when model confidence is low.
  • Store provenance: model_version, prompt_hash, retrieved_doc_ids, and response_time for audits.

Step 7 — Personalization strategies

Personalization must balance relevance, privacy and cost. Consider identity risk and data handling best-practices described in identity risk guidance.

  1. Short-term session memory: Keep session-level interactions available to the prompt (last 3 interactions).
  2. Long-term profile vectors: Store user embeddings for preference vectors; include similarity score in the prompt to bias results.
  3. Content-level signals: Use document embeddings and RAG to ground recommendations in up-to-date internal assets.
  4. Decay & exploration: Add an exploration weight so you occasionally recommend new items for discovery.

Sample personalization input fragment

{
  "user_vector_similarity": 0.83,
  "recently_accepted_topics": ["kubernetes","observability"],
  "last_accept_time": "2026-01-10T10:32:00Z"
}

Step 8 — Metrics, monitoring and A/B

Instrument at three levels:

  • Model-level: latency, token usage, failure rate, confidence score distribution.
  • Business-level: accept rate, time-to-decision, task completion uplift.
  • Quality-level: constraint-violation rate, hallucination incidents, manual override count.

Run A/B tests with different prompt templates, weighting functions, and RAG context sizes to measure cost vs performance. In 2026 many teams found a sweet spot by combining a small (top-3) retrieved docs + a lightweight personalization vector to reduce token costs while preserving relevance.

Real-world example: Internal Training Course Recommender (end-to-end)

Use case: Recommend internal training (video/article) to an engineer with budget constraints and time availability.

Input example

{
  "user_id":"u123",
  "hard_constraints": {"max_time_mins": 60, "budget": {"max": 30}},
  "soft_preferences": [{"key":"format","value":"video","weight":0.8},{"key":"level","value":"intermediate","weight":0.7}],
  "context": {"recent_views":["intro-to-mesh"], "team":"platform"}
}

Prompt (pattern B — clarify then recommend)

System: You MUST return JSON matching schema. If multiple formats, prefer format with highest weight. Ask at most 1 clarifying question.
User: Given input, if max_time_mins < 30 and preference format=video, ask: "Short videos under 30 mins or articles ok?" Otherwise return recommendations.

Expected output (truncated)

{
  "recommendations": [
    {"id":"c789","title":"Observability Patterns (Video)","score":0.92,
     "primary_reason":"Matches video preference, duration 45m < max_time 60m","explainability":{...},"metadata":{"duration_mins":45}}
  ],
  "summary":"1 recommended course fits constraints",
  "errors":[]
}

When the LLM returns valid JSON with explainability, the micro-app shows the recommendation with a CTA and stores telemetry.

Advanced strategies & anti-patterns

Strategies

  • Prompt ensembles: Run two short prompts — one optimizing for hard constraints, one for serendipity — then merge results via deterministic logic.
  • Constraint-first filters: Pre-filter candidate pool programmatically before prompting to reduce model errors and token cost.
  • Context compression: Use retrieval + summarization to include only the most relevant doc snippets in the prompt.

Anti-patterns

  • Relying on free-text responses only — hard to validate and brittle.
  • Feeding full corpora into the prompt rather than using RAG and embeddings.
  • Expecting the model to enforce complex business rules without programmatic validation.

Security, compliance and cost controls

Internal micro-apps often surface sensitive signals. Follow these best practices:

  • PII minimization: Strip or pseudonymize user identifiers before sending to external LLMs — a key step to reduce identity risk.
  • On-prem or private endpoints: Use on-prem LLMs or private Claude/ChatGPT enterprise endpoints when dealing with regulated data; consult indexing & edge manuals for best practices.
  • Rate & token limits: Enforce per-user quotas and caching of repeated prompts to control costs.
  • Audit logs: Persist prompt_hash, model_version, retrieved_doc_ids and response JSON for audits and model debugging.

Testing prompts: unit tests and golden outputs

Treat prompts like code. Build a test-suite with golden inputs and expected JSON outputs. Run nightly checks to detect model drift — for example, when a model replaces numeric types with strings or breaks the schema. Pair tests with developer productivity tooling and CI signals described in developer productivity reports.

Deployment patterns for micro-apps

  • Slack/Teams quick-action: Thin orchestrator service + modal that collects inputs, calls LLM, validates JSON, returns result card.
  • Web widget: React component that gathers preferences, calls backend orchestrator for LLM calls and validation.
  • Desktop agent: Local agent (e.g., Anthropic Cowork-style) with file-system access for advanced RAG on internal docs — requires strict ACLs.

Case study: 2-week MVP at an enterprise (summary)

Team: Internal tools + 1 ML engineer. Goal: recommend vendor options for small purchases. Approach:

  1. Defined hard constraints (max spend, approved vendors) and soft preferences (lead time, sustainability score).
  2. Pre-filtered vendor DB, used top-5 vendor rows as candidates, passed them and user profile to Claude with explicit JSON schema.
  3. Built validation layer; constraint-violation rate dropped to 0.5% after two prompt iterations.

Outcome: Decision latency reduced from 3 days to 4 hours (user research + approvals), adoption by procurement pilot team 42% month one.

Future predictions (2026+)

  • Micro-app marketplaces: Internal micro-app registries where business users share vetted recommenders and prompt templates.
  • Stronger schema enforcement: LLMs will adopt native JSON-schema bindings reducing the need for repeated re-prompts.
  • Smarter local agents: Tools like Cowork will enable desktop agents that combine local files with cloud LLMs for richer RAG while keeping data private. See indexing & edge guidance: Indexing Manuals for the Edge Era.

Checklist: Launch-ready recommender micro-app

  • Defined KPIs and user flows
  • Input and output JSON schemas implemented and validated
  • Prompt templates covering common cases and clarifications
  • RAG pipeline and personalization vectors configured
  • Orchestrator with schema validation and fallback logic
  • Telemetry for model & business metrics
  • Security, PII handling and audit logging in place

Quick prompt templates (copy-paste)

Minimal single-turn (ChatGPT)

System: You are an internal recommender. Output valid JSON matching THIS SCHEMA: {...schema...}. Do NOT include text outside JSON.
User: Input: {...input JSON...}.
User: Return up to 5 recommendations that satisfy all hard_constraints. Rank by score and include explainability.

Interactive clarifier

System: If preferences are missing or ambiguous, ask 1 clarifying question in JSON: {"clarify": true, "question":"..."}. Otherwise return recommendations JSON.
User: Input: {...}

Closing: Actionable next steps

Start small: pick one narrow domain and build a two-screen micro-app. Use the input/output schemas in this cookbook and run 50–100 test cases. Measure constraint-violation rate and iterate the prompt until it’s under 1%.

Want templates, schema files and working prompt bundles? We maintain a library of production-ready prompt templates, JSON schemas and orchestration samples tailored for ChatGPT and Claude that you can fork and customize.

Call to action

Get the micro-app prompt template pack from bot365.co.uk or contact our team for a free 2-week pilot to convert one internal workflow into a production recommender micro-app — faster than you think. Ship smarter recommendations with fewer engineering hours.

Advertisement

Related Topics

#prompts#recommendation#internal tools
b

bot365

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-01-24T04:15:23.233Z