🏛️ AI 產品設計模式

Q: 小專案也需要這些設計模式嗎？

內部工具或 MVP 可以先跳過，直接呼叫 API。但只要有**外部用戶使用的 AI 功能**，至少要有 Graceful Degradation（不能噴錯誤給用戶）和基本的輸入過濾。

Q: Model Routing 的分類器本身也要花 Token 嗎？

是的，但分類用的 prompt 很短（< 100 tokens），用 GPT-4o-mini 一次只需 ~$0.00005。相比路由節省的成本（60%+），分類器成本可忽略不計。

Human-in-the-Loop、Fallback、Model Routing——打造可靠 AI 產品的工程模式。

進階 design-patterns architecture production

🏛️ 為什麼需要設計模式？

AI 模型不是 100% 可靠的——它會幻覺、會犯錯、會變慢、會宕機。設計模式就是讓你的 AI 產品在這些不完美中仍然穩定運作。

💡 一句話理解 AI 設計模式 = 軟體工程的 Design Pattern 在 AI 應用上的延伸。解決的核心問題：「AI 不可靠，但產品必須可靠」。

🔄 模式 1：Human-in-the-Loop（人機協作）

讓人類在關鍵節點介入審核，而不是完全自動化。

三種介入策略

策略	做法	適合
Always Review	每次 AI 輸出都需人類確認	醫療、法律、金融
Confidence Gate	低信心時才要求人類介入	客服、分類
Random Audit	隨機抽查 AI 結果	大量自動化場景

Confidence Gate 實作

def ai_with_human_review(question, confidence_threshold=0.7):
    """信心度低時自動轉人工"""

    # AI 回答 + 自評信心度
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "user",
            "content": f"""回答以下問題，並評估你的信心度。

問題：{question}

用 JSON 回答：
{{"answer": "你的回答", "confidence": 0.0到1.0, "reason": "信心度理由"}}"""
        }],
        response_format={"type": "json_object"}
    )

    result = json.loads(response.choices[0].message.content)

    if result["confidence"] >= confidence_threshold:
        return {"type": "auto", "answer": result["answer"]}
    else:
        # 信心不足 → 轉人工
        return {
            "type": "human_review",
            "ai_draft": result["answer"],
            "confidence": result["confidence"],
            "reason": result["reason"]
        }

🛟 模式 2：Graceful Degradation（優雅降級）

AI 失敗時，不要給使用者一個錯誤頁面——要有備案。

降級瀑布

async def smart_response(question):
    """多層降級策略"""

    # Level 1：AI 完整回答
    try:
        answer = await call_ai(question, model="gpt-4o")
        return {"level": "full_ai", "answer": answer}
    except Exception:
        pass

    # Level 2：用較便宜的模型
    try:
        answer = await call_ai(question, model="gpt-4o-mini")
        return {"level": "fallback_model", "answer": answer}
    except Exception:
        pass

    # Level 3：從 FAQ 資料庫直接匹配
    faq_answer = search_faq_database(question)
    if faq_answer:
        return {"level": "faq_match", "answer": faq_answer}

    # Level 4：固定回覆（最後防線）
    return {
        "level": "static_fallback",
        "answer": "抱歉，目前系統忙碌中。請稍後再試，或聯繫客服：support@company.com"
    }

💸 模式 3：Model Routing（模型智慧路由）

不是所有問題都需要用最貴的模型。先用便宜的，需要時再升級。

Router 實作

class ModelRouter:
    """根據問題複雜度自動選擇模型"""

    def __init__(self):
        self.classifier = ChatOpenAI(model="gpt-4o-mini")

    async def route(self, question):
        # 用便宜模型判斷問題複雜度
        classification = await self.classifier.invoke([{
            "role": "user",
            "content": f"""判斷以下問題的複雜度（simple/moderate/complex）：
問題：{question}
只回答一個字：simple、moderate 或 complex"""
        }])

        complexity = classification.content.strip().lower()

        model_map = {
            "simple": "gpt-4o-mini",       # $0.15/M → 簡單問答
            "moderate": "gpt-4o",           # $2.50/M → 一般任務
            "complex": "gpt-4o",            # $2.50/M → 複雜推理
        }

        selected = model_map.get(complexity, "gpt-4o")
        return selected

# 使用
router = ModelRouter()
model = await router.route("今天天氣如何？")        # → gpt-4o-mini
model = await router.route("分析這份財報的風險因素")  # → gpt-4o

成本影響

流量分佈	月呼叫量	全用 GPT-4o	用 Router	節省
70% 簡單 + 30% 複雜	100K	$250	$100	60%

🗄️ 模式 4：Semantic Cache（語意快取）

相似的問題直接回傳快取結果，避免重複呼叫 API。底層依賴 Embedding 計算語意相似度。

class SemanticCache:
    """根據語意相似度快取 AI 回答"""

    def __init__(self, threshold=0.92):
        self.threshold = threshold
        self.cache = []  # [(embedding, question, answer)]

    def get(self, question):
        q_embed = get_embedding(question)

        for cached_embed, cached_q, cached_answer in self.cache:
            similarity = cosine_similarity(q_embed, cached_embed)
            if similarity >= self.threshold:
                return cached_answer  # 快取命中！

        return None  # 快取未命中

    def set(self, question, answer):
        q_embed = get_embedding(question)
        self.cache.append((q_embed, question, answer))

# 使用
cache = SemanticCache(threshold=0.92)

def ask_ai(question):
    # 先查快取
    cached = cache.get(question)
    if cached:
        return cached  # 免費！不呼叫 API

    # 快取未命中 → 呼叫 API
    answer = call_ai(question)
    cache.set(question, answer)
    return answer

# 「怎麼退貨」和「退貨流程是什麼」語意接近 → 快取命中
ask_ai("怎麼退貨？")          # 呼叫 API
ask_ai("退貨流程是什麼？")    # 快取命中，免費！

🔍 模式 5：Guardrail Pipeline（護欄流水線）

在 AI 的輸入和輸出兩端加上安全檢查。

用戶輸入
  ↓
[輸入護欄] → 長度限制 / 注入偵測 / PII 去識別
  ↓
[AI 模型] → 生成回答
  ↓
[輸出護欄] → 幻覺檢測 / PII 過濾 / 毒性檢查
  ↓
[品質閘門] → Confidence 太低？轉人工
  ↓
回覆用戶

📊 模式 6：A/B Testing AI

測試不同的 Prompt、模型或參數，用數據決定哪個更好。搭配 LLM 評估模式使用效果更佳。

import random

class AIExperiment:
    """AI 功能 A/B 測試"""

    def __init__(self):
        self.variants = {
            "control": {
                "model": "gpt-4o",
                "prompt": "你是客服助理。用繁體中文簡潔回答。",
                "metrics": {"total": 0, "positive_feedback": 0}
            },
            "treatment": {
                "model": "gpt-4o",
                "prompt": "你是熱情友善的客服。回答時先同理用戶，再解決問題。",
                "metrics": {"total": 0, "positive_feedback": 0}
            }
        }

    def get_variant(self, user_id):
        """根據 user_id 一致性分流"""
        variant = "treatment" if hash(user_id) % 2 == 0 else "control"
        return variant

    def record_feedback(self, variant, is_positive):
        self.variants[variant]["metrics"]["total"] += 1
        if is_positive:
            self.variants[variant]["metrics"]["positive_feedback"] += 1

    def report(self):
        for name, v in self.variants.items():
            total = v["metrics"]["total"]
            pos = v["metrics"]["positive_feedback"]
            rate = pos / total * 100 if total > 0 else 0
            print(f"{name}: {rate:.1f}% 正面回饋 ({pos}/{total})")

📋 設計模式選用指南

你的問題	推薦模式
AI 回答不能出錯	Human-in-the-Loop
API 可能宕機	Graceful Degradation
成本太高	Model Routing + Semantic Cache
擔心安全問題	Guardrail Pipeline
不知道哪個 Prompt 好	A/B Testing
全部都要	組合使用 ✅

❓ FAQ

小專案也需要這些設計模式嗎？

內部工具或 MVP 可以先跳過，直接呼叫 API。但只要有外部用戶使用的 AI 功能，至少要有 Graceful Degradation（不能噴錯誤給用戶）和基本的輸入過濾。

Semantic Cache 的快取命中率大概多少？

取決於場景。客服場景（問題重複性高）通常有 40-60% 命中率。開放式問答（問題多元）可能只有 5-10%。閾值設太低會回傳不相關的快取，建議 0.90 以上。

Model Routing 的分類器本身也要花 Token 嗎？