LLM Applications & Workflows

hw08 #FIXME:URL

When to Use LLMs

Before diving into agents, RAG, and workflows — the most important skill is knowing when to use LLMs and when not to.

Good Fits for LLMs

Text summarization and transformation: Condense documents while preserving key information
Structured data extraction: Convert unstructured text to structured formats (JSON, tables)
Content classification: Categorize by type, topic, sentiment
Question answering over documents: Answer questions based on provided context
Draft generation with review: First drafts that humans refine

Poor Fits for LLMs

Conversely, some tasks look like they should work but consistently produce poor results:

Precise calculations: Use tools (calculators, code) instead
Factual retrieval without verification: LLMs may hallucinate
Real-time data without external connection: Models have knowledge cutoffs
High-stakes autonomous decisions: Require human oversight
Deterministic logic: Use rule engines instead

Reference Card: LLM Decision Framework

Question	Yes →	No →
Can you describe the task clearly?	Good candidate	Clarify requirements first
Are errors catchable?	Proceed with validation	Add human review or avoid
Can you validate outputs?	Automate with checks	Use expert oversight
Do you have domain expertise to evaluate?	LLM amplifies your skill	Risk of undetected errors

Common Failure Modes

!!! warning If you don’t know how to do something yourself, you won’t know if an LLM is doing it well. LLMs amplify expertise — they don’t replace it.

Understanding how LLMs fail helps you design better systems and set appropriate expectations. Fighting With AI covers these patterns in depth with actionable mitigation strategies.

Reference Card: Failure Modes & Mitigations

Failure Mode	What Happens	Mitigation
Hallucinations	Fabricated citations, confident incorrect answers	RAG, fact-checking, citations, temperature=0, training data curation
Prompt injection	User input overrides system instructions	Input sanitization, delimiters, XML tags
Inconsistency	Same input → different outputs	temperature=0, seeded states, validation
Math errors	Arithmetic fails silently, especially multi-step unit conversions	Tool use (not guaranteed); or LLM extracts values, Python computes
Context overflow	Important information at edges gets lost	Strategic positioning, chunking, hierarchical summarization
Task/expertise mismatch	User can’t identify LLM errors	Expert review, reference materials, limit autonomy

Prompt Injection

Models may treat user content as instructions. The defense: separate system instructions from user content using roles and delimiters (XML tags like <user_input>...</user_input>). Guardrails (covered in Workflows below) validate both inputs and outputs.

Math Errors

LLMs approximate numbers through pattern matching — they don’t execute arithmetic. Multi-step calculations with unit conversions (e.g., mcg/kg/min → mL/hr) fail more often than simple ones. Never rely on LLM arithmetic for critical values — use the LLM to extract values, then compute with Python. We’ll see code for this in the Deterministic Steps pattern below.

Practical Recommendations

Start Small

Every major provider offers a model hierarchy — start with the smallest model that handles your task and only upgrade when needed. Each tier is roughly 5–10x cheaper than the one above it:

OpenAI: GPT-X.2 → GPT-X-mini → GPT-X-nano
Anthropic: Claude Opus → Claude Sonnet → Claude Haiku
Google: Gemini Pro → Gemini Flash → Gemini Flash Lite

Self-hosted options (free, private): Ollama (desktop) and PocketPal (mobile) let you run models locally — no API costs, no usage limits, ideal for sensitive data prototyping.

Testing & Validation

Start simple:

Test on 5–10 representative examples first
Manually review outputs
Try edge cases (missing data, unusual formats)
Incorporate failures into few-shot examples

Red flags to watch for:

Inconsistent outputs for similar inputs
Made-up citations or facts
Missing required information
Wrong format or structure

Choose tasks that you can meaningfully oversee. Think of LLMs as prolific interns — productive but requiring supervision.

Reference Card: Getting Started Checklist

Step	Action
1. Prototype	Use a mini/nano model (gpt-5-mini, Claude Haiku, Gemini Flash)
2. Test	Run 5–10 representative examples, manually review outputs
3. Edge cases	Try missing data, unusual formats, adversarial inputs
4. Iterate	Incorporate failures into few-shot examples or guardrails
5. Upgrade	Switch to a larger model only if the smaller one can’t handle it
6. Monitor	Track costs, latency, and output quality in production

Agentic LLMs

You can send a prompt and get a response. Now: what can you build with it?

An agent is an LLM that can take actions — not just generate text, but use tools, gather information, and iterate toward a goal. When Claude Code reads your files, decides what to edit, runs tests, and loops until the bug is fixed — that’s an agent. When ChatGPT searches the web, reads results, and synthesizes an answer — that’s an agent too.

The key difference from a chatbot: an agent has a goal and takes actions to achieve it. It decides what to do next based on what it observes, rather than waiting for you to tell it each step. This autonomy is what makes agents powerful — and what makes them tricky to get right.

Traditional vs Agentic LLM Use

Traditional	Agentic
Single request → single response	Multi-turn, self-guided iterations
User provides all context	Agent gathers information as needed
Fixed output	Iterates until task complete
No tool access	Can invoke external functions

Key Characteristics of Agents

Autonomy: Agent decides next steps based on observations
Tool use: Can invoke external functions (search, database queries, calculators)
Iteration: Loops until task complete or max steps reached
State management: Maintains context across multiple actions

The Agent Loop

Plan → Act → Observe → Reflect → (repeat)

This loop naturally extends chain-of-thought reasoning — instead of reasoning in a single generation, the agent reasons across multiple steps, each grounded in real observations rather than generated all at once.

Here’s what that looks like for a real task:

Task: "Find recent papers on treatment X and summarize findings"
    ↓
1. Agent searches literature database (tool call)
    ↓
2. Agent reads top 3 papers (tool call)
    ↓
3. Agent synthesizes findings
    ↓
4. Agent checks if answer is complete
    ↓
   If not → searches for more specific info
    ↓
5. Returns final summary

Reference Card: Agent Components

Component	Purpose
Planner	Breaks task into steps
Memory	Stores conversation history and intermediate results
Tools	External functions the agent can call
Executor	Runs tools and collects results
Reflector	Evaluates progress, decides whether to continue or return

Code Snippet: Simple Agent Loop

This is what’s happening under the hood when tools like Claude Code or ChatGPT work on multi-step tasks:

from openai import OpenAI
client = OpenAI()

def agent_loop(task, tools, max_steps=10):
    messages = [{"role": "user", "content": task}]

    for step in range(max_steps):
        # PLAN/ACT: send conversation so far, let the model decide what to do
        response = client.chat.completions.create(
            model="gpt-5.2", messages=messages, tools=tools
        )
        reply = response.choices[0].message
        messages.append(reply)

        # DONE? if the model didn't ask to call any tools, it's finished
        if not reply.tool_calls:
            return reply.content

        # OBSERVE: execute each tool the model requested, feed results back
        for call in reply.tool_calls:
            result = run_tool(call)
            messages.append({"role": "tool", "tool_call_id": call.id,
                             "content": str(result)})
        # loop back → model sees the tool results and decides next step

    return "Max steps reached"

Building an Agent

You’ve seen the concepts — now here’s what defining an agent looks like in code. The OpenAI Agents SDK wraps the agent loop, tool dispatch, and message management into a clean API:

Code Snippet: OpenAI Agents SDK

# pip install openai-agents pydantic
from pydantic import BaseModel
from agents import Agent, Runner, function_tool, set_default_openai_client
from openai import OpenAI

# Point the SDK at any OpenAI-compatible API (OpenRouter, local Ollama, etc.)
set_default_openai_client(OpenAI(base_url="https://openrouter.ai/api/v1"))

@function_tool
def calculate_bmi(weight_kg: float, height_m: float) -> str:
    """Calculate BMI from weight and height."""
    bmi = weight_kg / (height_m ** 2)
    return f"BMI: {bmi:.1f}"

# Structured output — agent must return data matching this schema
class PatientReport(BaseModel):
    bmi: float
    category: str
    recommendation: str

agent = Agent(
    name="Health Assistant",
    instructions="You help with health data analysis. Use tools for calculations.",
    tools=[calculate_bmi],
    output_type=PatientReport,  # forces structured JSON output
)

# max_turns limits how many agent loop iterations (tool calls) before stopping
result = Runner.run_sync(agent, "Calculate BMI for a 75kg patient who is 1.75m tall",
                         max_turns=10)
print(result.final_output)  # PatientReport object

The SDK handles the agent loop automatically — you define tools, instructions, and the agent figures out the rest. Key parameters:

output_type: Forces the agent to return structured data matching a Pydantic model — the same pattern you saw with schema-based prompting in Lecture 7, but now enforced by the framework.
max_turns: Limits how many iterations the agent loop can run. Without it, an agent could loop indefinitely if it keeps calling tools. Set it based on task complexity — simple tasks need 5–10 turns, complex investigations might need 30–50.
set_default_openai_client: Points the SDK at any OpenAI-compatible API endpoint (OpenRouter, a local Ollama server, etc.) instead of the default OpenAI API.

More framework options in the Workflow section below.

Prompting Techniques for Agents

In Lecture 7 you learned chain-of-thought and prompt chaining. Agents extend these with patterns designed for multi-step, tool-using workflows:

Reference Card: Agentic Prompting Patterns

Pattern	How It Works	When to Use
ReAct (Reason+Act)	Interleave reasoning with tool calls: think → act → observe → think again	Any agent that uses tools — the default agent pattern
Self-consistency	Generate multiple reasoning paths, vote on the most common answer	High-stakes decisions where confidence matters
Reflection	Agent critiques its own output, surfaces uncertainty and assumptions	Complex tasks where errors are costly
Tree of Thought	Explore multiple solution branches, prune unpromising paths	Planning and multi-step reasoning tasks

!!! warning “Reasoning” in LLMs is not thinking. Models like o1/o3 use chain-of-thought at inference time, which can improve results on some tasks — but it doesn’t always help, it’s always more expensive, and it can create a false sense of confidence. See Apple’s “Illusion of Thinking” research.

LIVE DEMO!

Retrieval-Augmented Generation (RAG)

The core problem with LLMs: they only know what was in their training data, and they’ll confidently make things up when they don’t know. RAG (Retrieval-Augmented Generation) solves this by giving the model relevant documents at query time — instead of hoping the model knows something, you look it up first and include it in the prompt.

Why RAG?

Reduces hallucinations: Responses grounded in retrieved documents
Provides sources: Can cite specific documents
Keeps information current: Update documents without retraining
Domain adaptation: Use your own documents without fine-tuning

The RAG Pipeline

The pipeline:

Query → Embed → Retrieve Similar Chunks → Add to Prompt → Generate Response

Reference Card: RAG Pipeline

Component	Details
Signature	`query → embed → retrieve → augment → generate`
Purpose	Ground LLM responses in retrieved documents to reduce hallucination
Embed	Convert query to vector using same model as document embeddings
Retrieve	Find top-k similar chunks from vector store (ChromaDB, FAISS, etc.)
Augment	Insert retrieved chunks into system prompt as context
Generate	LLM produces response grounded in provided context

Code Snippet: Simple RAG Pipeline

from sentence_transformers import SentenceTransformer
import chromadb
from openai import OpenAI

embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
llm_client = OpenAI()
db = chromadb.Client()
collection = db.create_collection("docs")

def index_documents(documents):
    """Add documents to the vector store. In practice, split long documents
    into chunks first (e.g., by paragraph or fixed token count)."""
    embeddings = embedding_model.encode(documents).tolist()
    collection.add(
        documents=documents,
        embeddings=embeddings,
        ids=[f"doc_{i}" for i in range(len(documents))]
    )

def rag_query(question, n_results=3):
    """Retrieve relevant chunks and generate a grounded response."""
    query_embedding = embedding_model.encode([question]).tolist()
    results = collection.query(query_embeddings=query_embedding, n_results=n_results)

    context = "\n\n".join(results['documents'][0])

    response = llm_client.chat.completions.create(
        model="gpt-5-mini",
        messages=[
            {"role": "system", "content": f"Answer based on this context:\n{context}"},
            {"role": "user", "content": question}
        ]
    )
    return response.choices[0].message.content

Model Context Protocol (MCP)

MCP provides a standardized way to connect LLMs to external data sources and tools. Instead of writing custom integrations for each tool, MCP offers pre-built servers that expose capabilities in a consistent format.

!!! note MCP servers often run as Node.js processes. Install Node.js (brew install node on macOS, or nodejs.org) to use them.

Why MCP?

Standardization: Same interface for files, databases, APIs, web scraping
Reusability: Pre-built servers for common tools (GitHub, Slack, Postgres, etc.)
Security: Consistent authentication and permission model
Discovery: LLMs can discover available tools dynamically

How MCP Works

┌─────────────┐    MCP Protocol    ┌─────────────┐
│  LLM/Agent  │ ◄───────────────► │  MCP Server │ ◄──► External Service
└─────────────┘                    └─────────────┘
     Your code connects here            Pre-built or custom

MCP Server exposes tools and resources via a standard protocol
Your code connects to the server and discovers available capabilities
LLM receives tool definitions and can invoke them through your code

MCP fits naturally with agents: MCP servers are the tools that agents can call.

Function Calling

Whether you’re writing tools by hand or discovering them via MCP, the underlying mechanism is the same: function calling. You define functions using JSON schemas, and the model can invoke them. This serves two purposes:

Tool use: The agent chooses to call external functions (search, calculate, query a database). This is what makes agents agentic.
Structured output: You force the model to return data matching a specific schema — formatting, not action. You saw this in Lecture 7 with schema-based prompting; function calling guarantees compliance.

Both use the same API (tools parameter). Tool use is about action; structured output is about format.

Reference Card: MCP & Function Calling

Component	Details
MCP Server	Process that exposes tools/resources (e.g., filesystem server, database server)
MCP Tool	Function the LLM can invoke (e.g., `read_file`, `query_database`)
MCP Resource	Data the LLM can read (e.g., file contents, API responses)
MCP Transport	How client and server communicate (stdio, HTTP)
Function calling definition	JSON schema with properties and types
`tool_choice`	`"auto"` (model decides) or forced (specific function)
Tool use pattern	Model chooses tool → your code executes → result fed back
Structured output pattern	Model forced to return data matching schema

Code Snippet: Defining an MCP Server

An MCP server exposes Python functions as tools. The @mcp.tool() decorator + type hints are all it takes — the protocol handles schema generation, discovery, and transport:

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("Health Tools")

@mcp.tool()
def calculate_bmi(weight_kg: float, height_m: float) -> str:
    """Calculate BMI from weight and height."""
    bmi = weight_kg / (height_m ** 2)
    return f"BMI: {bmi:.1f}"

mcp.run()

Code Snippet: Configuring MCP Servers

In practice, MCP servers are configured in JSON — tools like Claude Code, Cursor, and ChatGPT read this config to connect to servers automatically:

{
  "mcp": {
    "servers": {
      "filesystem": {
        "command": "npx",
        "args": ["@modelcontextprotocol/server-filesystem", "/path/to/data"]
      },
      "postgres": {
        "command": "npx",
        "args": ["@modelcontextprotocol/server-postgres", "postgresql://localhost/mydb"]
      }
    }
  }
}

Once configured, the LLM can discover and call any tools the server exposes. For example, the filesystem server exposes read_file, write_file, list_directory, and search_files — the same tools Claude Code uses to navigate your codebase.

Common MCP Servers

Category	Server	Tools Exposed
File systems	`@modelcontextprotocol/server-filesystem`	`read_file`, `write_file`, `list_directory`
Databases	`@modelcontextprotocol/server-postgres`	`query`, `list_tables`, `describe_table`
Web	`@modelcontextprotocol/server-puppeteer`	`navigate`, `screenshot`, `click`, `fill`
Code	`@modelcontextprotocol/server-github`	Repository operations

LIVE DEMO!!

Workflow Orchestration Patterns

An agent without a workflow is a loose cannon — powerful but unpredictable. Real tasks span multiple steps and decision points. Workflows provide structure for complex LLM applications, turning ad-hoc agent behavior into something reliable, auditable, and cost-effective.

Why Workflows?

Reliability: Each step is simple, testable, debuggable
Cost control: Use small models for simple steps, large models only when needed
Auditability: Track which step failed, inspect intermediate outputs
Safety: Add guardrails, validation, and human checkpoints
State management: Maintain context across steps, remember decisions, handle partial failures
Error handling: Retries, fallbacks, human-in-the-loop checkpoints
Observability: Inspect intermediate outputs, debug chains, trace failures to specific steps

Two implementation approaches:

Visual builders (e.g., OpenAI Agent Builder): Good for rapid prototyping and demos
Code-first (e.g., Agents SDK, LangGraph): Better for complex logic, version control, and production systems

Pattern: Prompt Chaining

Why not put everything in one big prompt? Because each step in a chain is simpler, more testable, and produces an intermediate artifact you can inspect. If step 2 fails, you know exactly where — and you can fix that step without touching the others. Chaining also lets you use different models or temperatures per step (e.g., a cheap model for extraction, an expensive one for synthesis).

We’ll define a simple llm_call() wrapper here and reuse it throughout the rest of this lecture:

Code Snippet: Prompt Chain

from openai import OpenAI

client = OpenAI()

def llm_call(prompt: str) -> str:
    """Simple wrapper for OpenAI chat completion."""
    response = client.chat.completions.create(
        model="gpt-5-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

def extract_classify_summarize(document: str) -> dict:
    """Chain of LLM calls: extract → classify → summarize."""
    entities = llm_call(f"Extract all medical entities from this text as a list:\n{document}")
    classified = llm_call(f"Classify these entities by type (condition, medication, procedure):\n{entities}")
    summary = llm_call(f"Write a brief clinical summary based on:\n{classified}")

    return {"entities": entities, "classified": classified, "summary": summary}

Workflow Patterns

Most agent builders represent workflows as a graph of nodes. The exact names vary by framework, but the building blocks are consistent:

Model call: A single LLM prompt/response step
Tool call: Deterministic function execution (search, database, calculator)
Router/logic: Branching based on classification or criteria
Guardrail/validator: Check input/output for safety, schema, or quality
Human approval: Pause for review before high-stakes actions
Parallel fan-out: Run independent steps concurrently and merge results

Pattern: Guardrails

Concept: Input/output monitors that enforce safety and compliance rules

Common guardrails:

PII/PHI detection: Flag or redact protected health information
Hallucination detection: Check if claims are grounded in source text
Jailbreak detection: Identify prompt injection attempts
Format validation: Ensure structured outputs meet schema

Reference Card: Common Guardrails

Guardrail	Purpose
PII/PHI detection	Flag or redact Protected Health Information or Personally Identifiable Information
Hallucination detection	Check if claims are grounded in source text
Jailbreak detection	Identify prompt injection attempts
Format validation	Ensure structured outputs meet schema
Content filtering	Block inappropriate content

Code Snippet: Guardrails (Input/Output Check)

import re

def check_for_phi(text: str) -> list[str]:
    """Check for common PHI patterns. Production systems use NLP models
    (e.g., Presidio, clinical NER) for more robust detection."""
    patterns = {
        'SSN': r'\b\d{3}-\d{2}-\d{4}\b',
        'phone': r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b',
        'MRN': r'\b(MRN|Medical Record)[\s:#]*\d+\b',
    }
    return [name for name, pat in patterns.items() if re.search(pat, text, re.IGNORECASE)]

def safe_llm_call(prompt: str) -> str:
    """Wrap llm_call() with input and output guardrails."""
    if found := check_for_phi(prompt):
        raise ValueError(f"PHI detected in input: {found}")
    output = llm_call(prompt)
    if found := check_for_phi(output):
        raise ValueError(f"PHI detected in output: {found}")
    return output

Pattern: Routing & Logic

Concept: Conditional branching based on content or criteria

Logic nodes:

If/else: Route based on classification or criteria
While loops: Repeat until condition met (e.g., all sections complete)
Human approval: Pause for review before high-stakes action

Pattern: Deterministic Steps

Concept: Integrate rule-based logic alongside LLM calls. Use LLMs for what they’re good at (language), use code for what it’s good at (math, lookups, validation).

Use cases: Known logic that does not require LLM flexibility — dose calculations, date arithmetic, database lookups, schema validation.

Code Snippet: LLM Extracts, Python Validates

import json

REQUIRED_FIELDS = {"diagnosis": str, "medications": list, "allergies": list}

def extract_and_validate(clinical_note: str) -> dict:
    """LLM extracts structured data; Python validates the result."""
    raw = llm_call(f"Extract diagnosis, medications, and allergies as JSON:\n{clinical_note}")
    data = json.loads(raw)

    # DETERMINISTIC: Validate structure (never trust LLM output blindly)
    for field, expected_type in REQUIRED_FIELDS.items():
        if field not in data:
            raise ValueError(f"Missing required field: {field}")
        if not isinstance(data[field], expected_type):
            raise TypeError(f"{field} must be {expected_type.__name__}")

    return data

Pattern: Parallelization

Concept: Run independent LLM tasks simultaneously

Speed:

Divide-and-conquer: Split subtasks, execute in parallel, combine results
First-to-finish: Start the same task with different strategies, accept the first completed

Confidence:

Run task multiple times with different prompts or models
Choose winner or synthesize results

Pattern: Orchestrator-Workers

Concept: A central agent breaks a task into subtasks and delegates each to a specialist worker. The orchestrator coordinates results.

Use cases:

Multi-section reports (each section handled by a worker)
Medication extraction plus interaction checking plus summary generation
Research tasks requiring multiple queries

Pattern: Evaluator-Optimizer

Concept: Generate a response, evaluate its quality (with a second LLM call or deterministic checks), then refine until it meets a threshold.

Use cases:

Clinical note generation with completeness validation
Extraction tasks with accuracy scoring
Draft letters that must meet strict formatting

Pattern: Human-in-the-loop

Concept: Pause for human review before high-stakes actions. The workflow continues only after explicit approval.

Common checkpoints:

Diagnosis confirmation
Prescription recommendations
Sending patient communications

Agent & Workflow Frameworks

Framework	Focus	Notes
OpenAI Agents SDK	Agent building with tools, handoffs, guardrails, tracing	Primary framework for this course. Has Agent Builder GUI.
LangChain / LangGraph	Chains, agents, stateful graphs	Widely used, steeper learning curve. Good for custom workflows.
AutoGen (Microsoft)	Multi-agent conversations	Research-oriented, good for multi-agent patterns
smolagents (Hugging Face)	Lightweight agents	Minimal, good for quick prototyping

Reference Card: Workflow Patterns

Pattern	When to Use	Key Benefit
Prompt Chaining	Sequential multi-step processing	Each step simple and testable
Guardrails	Safety-critical applications	Enforce compliance rules
Deterministic Steps	Math, lookups, exact logic	Correctness guarantees
Orchestrator-Workers	Complex tasks needing specialization	Divide and conquer
Evaluator-Optimizer	Quality-sensitive outputs	Iterative refinement
Routing	Variable task types	Match task to best handler

The Recurring Theme

These are bias machines. They learn from whatever data and labels we give them. Neural networks (and LLMs) absorb whatever biases exist in their training data. If we’re lucky, we might guess at the biases we introduce — but not always.

If you don’t know how to do something yourself, you won’t know if an LLM is doing it well. Domain expertise is the irreplaceable ingredient.

Links

Prompt Engineering Guides

Frameworks

MCP

Self-Hosting & Tools

Healthcare AI

Developer Tools

Cookbooks & Guides

Workflow Orchestrators

Papers

When to Use LLMs

Good Fits for LLMs

Poor Fits for LLMs

Reference Card: LLM Decision Framework

Common Failure Modes

Reference Card: Failure Modes & Mitigations

Prompt Injection

Math Errors

Practical Recommendations

Start Small

Testing & Validation

Reference Card: Getting Started Checklist

Agentic LLMs

Traditional vs Agentic LLM Use

Key Characteristics of Agents

The Agent Loop

Reference Card: Agent Components

Code Snippet: Simple Agent Loop

Building an Agent

Code Snippet: OpenAI Agents SDK

Prompting Techniques for Agents

Reference Card: Agentic Prompting Patterns

LIVE DEMO!

Retrieval-Augmented Generation (RAG)

Why RAG?

The RAG Pipeline

Reference Card: RAG Pipeline

Code Snippet: Simple RAG Pipeline

Model Context Protocol (MCP)

Why MCP?

How MCP Works

Function Calling

Reference Card: MCP & Function Calling

Code Snippet: Defining an MCP Server

Code Snippet: Configuring MCP Servers

Common MCP Servers

LIVE DEMO!!

Workflow Orchestration Patterns

Why Workflows?

Pattern: Prompt Chaining

Code Snippet: Prompt Chain

Workflow Patterns

Pattern: Guardrails

Reference Card: Common Guardrails

Code Snippet: Guardrails (Input/Output Check)

Pattern: Routing & Logic

Pattern: Deterministic Steps

Code Snippet: LLM Extracts, Python Validates

Pattern: Parallelization

Pattern: Orchestrator-Workers

Pattern: Evaluator-Optimizer

Pattern: Human-in-the-loop

Agent & Workflow Frameworks

Reference Card: Workflow Patterns

The Recurring Theme

LIVE DEMO!!!