优化脚本，添加另一种风格的prompt

2026-03-26 23:19:55 +08:00
parent 14cad19e58
commit d81b3166e6
4 changed files with 380 additions and 158 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1,2 +1,3 @@
 **/output/*
 **/output_qwen/*
 **/__pycache__/*
--- a/config.yaml
+++ b/config.yaml
@@ -30,7 +30,7 @@ image:
                                            #   auto: CUDA/MPS→bfloat16, XPU→float16, CPU→float32
  device: "auto"                            # auto | cuda | xpu | mps | cpu
                                            #   auto: 自动检测可用设备（cuda > xpu > mps > cpu）
-  size_preset: "phone_hd"                     # 尺寸预设（优先于 height/width），可选值：
+  size_preset: "phone"                     # 尺寸预设（优先于 height/width），可选值：
                                            #   square       — 1024×1024  正方形（默认）
                                            #   phone        — 576×1024   手机壁纸 9:16
                                            #   phone_hd     — 768×1344   手机壁纸 9:16 高清
@@ -53,6 +53,7 @@ image:
  prompt_language: "zh"                     # zh | en — 发送给 Z-Image-Turbo 的 prompt 语言
                                            #   zh: 使用中文 prompt（Qwen3 中文编码器原生支持）
                                            #   en: 使用英文 prompt
  style_variants: 2                         # 每分镜实际出图的画风数：1 仅首套，2 两套均生成（LLM 仍输出两套文案）
  style_preference: ""                      # 风格期望（可选，留空则由 LLM 根据诗意自动选择）
                                            #   可选值示例：水墨写意 / 青绿山水 / 工笔花鸟 / 工笔重彩
                                            #             文人画 / 泼墨大写意 / 浅绛山水
--- a/poetry_to_image.py
+++ b/poetry_to_image.py
@@ -119,136 +119,84 @@ def load_config(config_path: str = "config.yaml") -> dict:
 # ---------------------------------------------------------------------------
 SYSTEM_PROMPT = """\
-# Role（角色设定）
+# Role
-你是一位顶级的中国古典文学泰斗，同时也是一位精通 AI 文本到图像生成（Text-to-Image）\
+你是一位顶级的中国古典文学泰斗，同时也是精通 AI 图像生成（Text-to-Image，特别是 Z-Image-Turbo/Midjourney 等扩散模型）底层逻辑的顶级提示词工程师。
-底层逻辑的顶级提示词工程师（Prompt Engineer）。\
+你对古诗词中的"意境"、"留白"、"虚实"有深刻理解，并能将这些抽象概念精准转化为扩散模型能够识别的**高权重视觉参数**（如明确的光影走向、材质纹理、构图视角、笔触细节）。
 你对中国古诗词中的"意境"、"留白"、"虚实相生"有极其深刻的理解，\
 并且知道如何将这些抽象的美学概念转化为扩散模型（Diffusion Models）能够精准识别的\
 视觉特征参数（如光影、材质、构图、渲染引擎词汇）。
-# Objective（工作目标）
+# Objective
-你的任务是接收用户输入的古诗词，严格按照"四段式思维链"将其转化为最高质量的图像生成提示词。\
+接收用户输入的古诗词，严格按照"四段式思维链"转化为最高质量的图像生成提示词。你需要能够探索全诗连贯意象，将诗句转化为1张或多张分镜，并为每张分镜提供2种不同且极其细致的传统绘画风格提示词，确保画面不仅贴合诗意，而且具备极高的艺术审美与画面张力。
 你需要具备探索长诗或多句诗词连贯多图意象的能力，\
 确保最终生成的单张或多张分镜图像能够完美传达原诗的意境，而不只是生硬的元素堆砌。
 # Workflow（强制执行四段式思维链）
-对于用户的每一次输入，你必须严格按顺序在内部执行以下四个步骤，缺一不可：
+对于每一次输入，必须在内部严格执行以下步骤，结果最终输出为 JSON：
 ## 第一步：意境与分镜逻辑判断
-
+- 若全诗时空统一，生成【单幅画面】。
-重要分析全诗的时空连贯性：
+- 若存在明显的视角切换（远近/高低）、时间推移（朝暮）或场景跳跃，拆分为 2-4 幅【分镜序列】。相邻且意境连贯的诗句应合并。
 - 如果全诗描绘的是同一时间、同一地点的统一场景，生成【单幅画面】
 - 如果诗句间存在明显的视角切换（如远景切特写）、时间推移（如白天到黑夜）或场景跳跃，\
 按内在逻辑拆分为 2 到 4 幅画面的【分镜序列】
 - 意境连贯的相邻诗句应合并为一幅，避免碎片化
 ## 第二步：意境深度解析
 分析每个分镜：核心情感基调、季节时间、天气状态、意境类型与情感张力。
-针对每一个分镜（或单幅画面），分析：
+## 第三步：现代文视觉脚本扩写（核心视觉转义）
- 核心情感基调（苍凉悲壮 / 空灵婉约 / 萧瑟肃杀 / 雄浑壮阔 / 闲适恬淡 / 凄婉哀怨等）
+将分镜扩写为极具画面感的视觉脚本，**必须将抽象词汇翻译为肉眼可见的物理细节**：
- 季节时间与天气状态
+- **主体与动作**：人物姿态/服饰/微表情，核心景物的精确形态。
- "意境"类型与情感张力
+- **配景与层次**：前景、中景、远景的具体构成，建立空间纵深。
 - **光线与色彩**：必须明确光源（如斜侧逆光、清冷月光、丁达尔效应/体积光）、色调（冷暖对比、低饱和度等）。
 - **气候与动态**：风的方向、云雾的形态（流云/贴水薄雾）、水波的纹理（波光粼粼/惊涛骇浪）。
 - **构图与尺度**：必须写明镜头视角（超大远景 / 黄金分割构图 / 仰视等），大远景必须加入尺度参照（远帆、飞鸟剪影、孤亭）以体现宏大感。
-## 第三步：现代文视觉转义
+## 第四步：图像 Prompt 生成（双画风）
 基于第三步的视觉脚本，为每一分镜设计 **2套** 彼此不同类别的中国传统画风（如：水墨写意 vs 青绿山水；工笔重彩 vs 浅绛山水）。
-将每一个分镜扩写为极具画面感的现代文视觉脚本。\
+### 🎨 提示词（Prompt）构建法则（极其重要）
-你必须大胆发挥想象力，补全诗句中省略的视觉细节，明确写出：
+1. **中英文对应与结构**：`prompt` 和 `prompt_en` 必须是一段连续、自然流畅的描述（**绝对禁止出现[ ] 括号或标签名**）。
- **主体景物**：人物姿态、动作、表情、服饰；核心景物的具体形态
+2. **英文生图语法强化 (`prompt_en`)**：英文提示词对模型影响最大，结构必须为：
- **配景与地理环境**：山川、水域、植被、建筑等空间层次
+   `[画风约束词] +[画面主体与动作] + [环境与空间层次] + [光影与气候细节] +[笔触/色彩/媒介质感] + [顶级画质词]`。
- **光线条件**：斜阳逆光、清冷月光、破晓微光、黄昏余晖等
+3. **拒绝空洞抽象**：不要只写"sorrowful atmosphere"或"philosophical depth"；必须用"withered lotus stalks bending in the cold wind, subdued blue-gray color palette"来表现抽象感。
- **天气效果**：晨雾弥漫、细雨如织、大雪纷飞、长风浩荡等
+4. **高质量风格约束词表（必须从以下挑选或组合并在 prompt 结尾处体现）**：
- **画面构图**：大远景 / 中景 / 特写 / 俯瞰 / 平视等
+   - **水墨写意**：Traditional Chinese ink wash painting, freehand brushwork (Xieyi), negative space, ethereal mist, varied ink tones, rhythmic brush strokes.
   - **青绿山水**：Traditional Chinese blue-green landscape painting, mineral pigments, azurite and malachite tones, gold foil accents, majestic momentum.
   - **工笔重彩**：Chinese meticulous heavy-color painting (Gongbi), rich saturated pigments, elaborate fine line drawing, opulent details, highly decorative.
   - **浅绛山水**：Light crimson landscape painting, ochre wash, sparse and distant, elegant and refined, minimalist composition.
-## 第四步：图像生成 Prompt 生成
+### 长度与质量标准
 - `prompt`（中文）：150 - 250 字。
 - `prompt_en`（英文）：100 - 200 词，多使用形容词+名词的词组（如 `volumetric lighting`, `cinematic lighting`, `intricate details`, `masterpiece, 8k resolution, best quality`）。
-基于第三步的现代文视觉脚本，为每一个分镜生成精确的图像 Prompt。
+# Output Format (JSON Only)
-### Prompt 结构（必须遵循）
+严格输出以下 JSON 结构，不要包含任何多余解释：
 每个 Prompt 必须涵盖以下六大要素，按顺序自然融合为一段连贯流畅的描述文字：
 1. 画面主体：核心人物 / 景物及其状态
 2. 环境背景：空间层次、地理环境、建筑植被
 3. 场景光影：具体光源、光线方向、明暗对比
 4. 气候与氛围：天气、季节、情感色彩
 5. 艺术风格与媒介：中国传统画风关键词 + 媒介质感
 6. 图像质量词：masterpiece, 8k resolution, highly detailed 等
 【极其重要】最终输出的 prompt 和 prompt_en 必须是自然流畅的连续段落，\
 绝对不要使用方括号 [] 标注要素名称，不要出现类似"[画面主体：...]"的格式标签。\
 六大要素是你内部的组织逻辑，输出时必须将它们无缝融合为一段完整的、富有画面感的描述。
 ### Prompt 长度要求
 Z-Image-Turbo 非常适合处理包含丰富细节的长描述提示词：
 - 中文 Prompt：80-250 字
 - 英文 Prompt：80-200 词
 ### 风格约束（极其重要）
 Z-Image-Turbo 不支持负面提示词（Negative Prompts），所有约束必须以正向描述表达。\
 为确保生成"古诗词意境"而非现代写实照片，你必须在 Prompt 末尾加上强有力的风格约束词。\
 以下是可根据诗意灵活选用的风格约束：
 | 风格 | Prompt 约束词 |
 |------|-------------|
 | 水墨写意 | Traditional Chinese ink wash painting (中国传统水墨画), freehand brushwork (写意), \
 negative space (留白), ethereal atmosphere (空灵的氛围) |
 | 青绿山水 | Traditional Chinese blue-green landscape painting (青绿山水), mineral pigments (石青石绿), \
 golden and jade-like tones (金碧辉煌) |
 | 工笔花鸟 | Chinese meticulous brushwork (工笔), fine detailed rendering (精细渲染), \
 delicate line drawing (细腻勾勒) |
 | 工笔重彩 | Chinese meticulous heavy-color painting (工笔重彩), rich saturated pigments (浓墨重色), \
 elaborate detail (华丽精细) |
 | 文人画 | Chinese literati painting (文人画), poetry-calligraphy-painting unity (诗书画印一体), \
 lofty elegance (意趣高远) |
 | 泼墨大写意 | Splash ink painting (泼墨大写意), bold expressive brushstrokes (墨色淋漓), \
 majestic momentum (气势磅礴) |
 | 浅绛山水 | Light crimson landscape painting (浅绛山水), ochre wash (赭石淡彩), \
 sparse and distant (萧疏清远) |
 通用质量约束词（所有风格都应附加）：\
 masterpiece, 8k resolution, highly detailed, cinematic composition
 如果用户指定了风格期望，请优先使用用户指定的风格。\
 如果用户未指定风格，请根据诗意自动选择最契合的传统画风。
 ### 中文 Prompt 要求
 - 使用中国传统绘画的专业术语
 - 具体且富有画面感，避免抽象空泛的概念
 - 末尾必须附加风格约束词和质量约束词
 ### 英文 Prompt 要求
 - 中文 Prompt 的忠实翻译与适配，保持相同的画面内容和风格意图
 - 使用对应的英文艺术术语
 - 自然流畅的英文表达，非逐字翻译
 - 末尾必须附加英文风格约束词和质量约束词
 # Rules（输出规则）
 严格按照以下 JSON 格式输出结果，不要输出任何与格式无关的文字。\
 四段式思维链的推理过程请融入到对应的 JSON 字段中：
 ```json
 {
  "title": "诗词标题",
  "author": "作者",
  "dynasty": "朝代",
-  "genre": "体裁（如：五言绝句、七言律诗、词·水调歌头等）",
+  "genre": "体裁",
-  "analysis": "第一步【分镜逻辑判断】的理由 + 第二步【意境深度解析】的综合分析：包含分镜拆分依据、整首诗的意境类型、核心情感基调、时空特征（中文，3-5句话）",
+  "analysis": "时空统一，全诗描绘大漠孤烟的壮阔黄昏，故作为单幅画面。意境雄浑苍凉，核心情感是孤寂与壮美的交织。画面需重点表现极致的几何对比（直烟与圆日）和宏大的空间尺度。",
  "images":[
    {
-      "scene": "这幅画对应的诗句（原文）",
+      "scene": "大漠孤烟直，长河落日圆。",
-      "description": "第三步【现代文视觉转义】的完整输出：极具画面感的视觉脚本，包含主体景物、配景、光线、天气、构图等所有视觉细节（中文，100-200字）",
+      "description": "超大远景构图。前景是连绵起伏的金黄色沙丘，沙纹在夕阳斜照下呈现出明暗交界的锋利边缘。中景一条宽阔的河流蜿蜒折射着波光。视线中央，一道笔直的白色烽烟冲天而起，没有一丝风。远景的地平线上，一轮巨大、血红的落日正悬挂在长河尽头。冷暖色调形成强烈对比，画面极具几何雄浑之美，远空有几只渺小的飞鸟剪影作为尺度参照。",
-      "style": "选用的画风（中文名称，如：水墨写意、青绿山水、工笔花鸟等）",
+      "variants":[
-      "prompt": "第四步生成的中文 Prompt，自然融合六大要素为连续流畅的段落（禁止使用方括号标注），末尾附加风格约束词和质量词，80-250字",
+        {
-      "prompt_en": "Step 4 English Prompt, naturally blending all six elements into a fluent paragraph (NO square brackets), ending with style and quality keywords, 80-200 words"
+          "style": "浅绛山水",
          "style_rationale": "浅绛山水的赭石淡彩能完美表现大漠黄昏的苍茫与孤寂感，线条萧疏清远。",
          "prompt": "一幅传统的中国浅绛山水画，超大远景构图。画面中央是一片连绵起伏的沙丘，沙纹细腻，远方一条宽阔的长河蜿蜒流淌。长河尽头的地平线上悬挂着一轮巨大的血红色落日。一道笔直的白色烽烟从烽火台冲天而起，直入云霄。天空中点缀着几只微小的飞鸟剪影，凸显出大漠的浩瀚无垠。画面采用赭石淡彩着色，夕阳的余晖给沙丘和河面染上一层凄美的暖光。留白与虚实相生，意境苍凉雄浑。杰作，8k分辨率，极致细节，电影级光影，最高画质。",
          "prompt_en": "A masterpiece of traditional Chinese light crimson landscape painting (Qianjiang), ultra-wide panoramic shot. Endless rolling sand dunes with delicate ripples in the foreground. A wide, majestic river winds its way through the vast desert. At the distant horizon of the river, a giant, blood-red setting sun hangs low. A single, perfectly straight column of white smoke rises directly into the sky from an ancient beacon tower. Tiny silhouettes of flying birds in the vast sky provide a sense of grand scale. Colored with subtle ochre wash and pale warm tones. The golden hour lighting casts long dramatic shadows on the dunes. Ethereal atmosphere, negative space, sparse and distant, traditional Chinese brushwork, masterpiece, 8k resolution, highly detailed, cinematic lighting, breathtaking scenery."
        },
        {
          "style": "泼墨大写意",
          "style_rationale": "通过墨色的酣畅淋漓与狂放笔触，强化沙漠与落日之间的磅礴气势与浑厚张力。",
          "prompt": "一幅气势磅礴的中国泼墨大写意画。用浓淡相宜的泼墨挥洒出连绵不绝的苍茫大漠与雄浑山势，笔触狂放且充满力量。一条留白形成的长河贯穿画面，河面波光隐约。长河尽头，用朱砂重彩点染出一轮巨大而耀眼的落日，与周围的黑白墨色形成极具视觉冲击力的红黑对比。一道用枯笔飞白表现的笔直烽烟直刺苍穹。画面充满墨色淋漓的律动感，光影粗犷，意境苍茫悲壮。杰作，8k画质，令人惊叹的笔触细节，艺术珍品。",
          "prompt_en": "A majestic traditional Chinese splash ink painting (Da Xieyi), majestic momentum. Bold, expressive, and sweeping ink brushstrokes create the vast, endless desert landscape and rugged terrain. A wide river is formed by masterful use of negative space, flowing through the center. At the end of the river, a massive, vibrant vermilion red setting sun is painted with heavy pigments, creating a striking contrast against the monochromatic black and gray ink wash. A straight column of smoke rises to the sky, rendered with dry brush techniques (Feibai). Dynamic ink splashes, rhythmic brushstrokes, bold black-and-red color contrast, atmospheric and dramatic lighting, masterpiece, 8k resolution, highly detailed, traditional Chinese art museum quality."
        }
      ]
    }
-```\
+  ]
 }\
 """
@@ -266,7 +214,7 @@ def _build_user_message(poem: str, cfg: dict) -> str:
        f"{style_line}\n\n"
        f"请严格按照 System Prompt 的要求，首先进行【意境与分镜逻辑判断】，"
        f"随后针对单幅或多幅分镜依次输出对应的【意境深度解析】、"
-        f"【现代文视觉转义】以及最终的【图像生成 Prompt】。"
+        f"【现代文视觉转义】，并为**每一幅分镜**输出 **2 套**不同传统画风的【图像生成 Prompt】（`variants` 数组）。"
    )
@@ -277,7 +225,7 @@ def analyze_poetry(poem: str, cfg: dict) -> dict:
    client = OpenAI(
        base_url=llm_cfg["base_url"],
        api_key=llm_cfg["api_key"],
-        timeout=60,
+        timeout=120,
    )
    style_pref = cfg["image"].get("style_preference", "").strip()
@@ -320,6 +268,33 @@ def analyze_poetry(poem: str, cfg: dict) -> dict:
    return result
 def _normalize_scene_variants(img_info: dict, max_variants: int) -> list[tuple[str, dict]]:
    """从单条分镜解析待生成的画风变体，供绘图循环使用。
    返回 [(文件名标签如 v01, variant 字典), ...]。
    兼容旧版 JSON（无 variants 数组时退回顶层 prompt / prompt_en）。
    """
    max_variants = max(1, min(2, int(max_variants)))
    raw = img_info.get("variants")
    collected: list[dict] = []
    if isinstance(raw, list):
        for v in raw:
            if isinstance(v, dict) and (v.get("prompt") or v.get("prompt_en")):
                collected.append(v)
    if collected:
        return [(f"v{idx:02d}", collected[idx - 1]) for idx in range(1, min(len(collected), max_variants) + 1)]
    legacy = {
        "style": img_info.get("style", ""),
        "style_rationale": "",
        "prompt": img_info.get("prompt", ""),
        "prompt_en": img_info.get("prompt_en", ""),
    }
    if legacy["prompt"] or legacy["prompt_en"]:
        return [("v01", legacy)]
    return []
 def display_analysis(analysis: dict) -> None:
    """友好地展示 LLM 的分析结果。"""
    print(f"\n{'='*60}")
@@ -334,13 +309,38 @@ def display_analysis(analysis: dict) -> None:
    for i, img in enumerate(analysis["images"], 1):
        print(f"{'─'*50}")
        print(f"🖼  第 {i} 幅  |  {img['scene']}")
        desc = img.get("description", "")
        if desc:
            print(f"   中文描述：{desc}")
        vlist = img.get("variants")
        if isinstance(vlist, list) and vlist:
            for vi, v in enumerate(vlist, 1):
                print(f"   ─ 画风方案 {vi}：{v.get('style', '未指定')}")
                if v.get("style_rationale"):
                    print(f"     说明：{v['style_rationale']}")
                zh = v.get("prompt") or ""
                if zh:
                    tail = "..." if len(zh) > 120 else ""
                    print(f"     Prompt(zh)：{zh[:120]}{tail}")
                en = v.get("prompt_en") or ""
                if en:
                    tail = "..." if len(en) > 120 else ""
                    print(f"     Prompt(en)：{en[:120]}{tail}")
        else:
            print(f"   画风选择：{img.get('style', '未指定')}")
-        print(f"   中文描述：{img['description']}")
+            zh = img.get("prompt") or ""
-        print(f"   Prompt(zh)：{img['prompt'][:120]}...")
+            if zh:
                print(f"   Prompt(zh)：{zh[:120]}..." if len(zh) > 120 else f"   Prompt(zh)：{zh}")
            if img.get("prompt_en"):
-            print(f"   Prompt(en)：{img['prompt_en'][:120]}...")
+                en = img["prompt_en"]
                print(f"   Prompt(en)：{en[:120]}..." if len(en) > 120 else f"   Prompt(en)：{en}")
-    print(f"\n共 {len(analysis['images'])} 幅画面\n")
+    n_scenes = len(analysis["images"])
    n_variants = sum(
        len(img["variants"]) if isinstance(img.get("variants"), list) else (1 if img.get("prompt") or img.get("prompt_en") else 0)
        for img in analysis["images"]
    )
    print(f"\n共 {n_scenes} 个分镜；LLM 共给出约 {n_variants} 套画风方案（生成张数受配置 style_variants 与 images_per_prompt 影响）\n")
 # ---------------------------------------------------------------------------
@@ -683,8 +683,10 @@ def generate_images(pipe, analysis: dict, cfg: dict) -> list[Path]:
    preset = img_cfg.get("size_preset", "custom")
    prompt_lang = img_cfg.get("prompt_language", "zh")
    images_per_prompt = max(1, min(10, img_cfg.get("images_per_prompt", 1)))
    max_style_variants = max(1, min(2, int(img_cfg.get("style_variants", 2))))
    print(f"图片尺寸: {width}×{height}" + (f" (预设: {preset})" if preset != "custom" else ""))
    print(f"Prompt 语言: {prompt_lang}")
    print(f"每分镜画风方案数: {max_style_variants}（配置项 style_variants，1 或 2）")
    if images_per_prompt > 1:
        print(f"每个 prompt 生成 {images_per_prompt} 张图（不同种子）")
@@ -692,25 +694,49 @@ def generate_images(pipe, analysis: dict, cfg: dict) -> list[Path]:
    total = len(analysis["images"])
    for i, img_info in enumerate(analysis["images"], 1):
-        if prompt_lang == "en" and img_info.get("prompt_en"):
+        variant_list = _normalize_scene_variants(img_info, max_style_variants)
-            prompt = img_info["prompt_en"]
+        if not variant_list:
            print(f"\n警告: 第 {i}/{total} 幅分镜无有效 prompt，已跳过: {img_info.get('scene', '')}")
            continue
        print(f"\n[{i}/{total}] 分镜: {img_info['scene']}")
        if out_cfg.get("save_prompts", True):
            txt_lines = [
                f"Scene: {img_info['scene']}\n",
                f"Description: {img_info.get('description', '')}\n",
                f"Prompt_language_used: {prompt_lang}\n",
            ]
        for vi, (v_label, variant) in enumerate(variant_list):
            if prompt_lang == "en" and variant.get("prompt_en"):
                prompt = variant["prompt_en"]
            else:
-            prompt = img_info["prompt"]
+                prompt = variant.get("prompt") or variant.get("prompt_en") or ""
            if trigger_words:
                prompt = f"{trigger_words}, {prompt}"
-        print(f"\n[{i}/{total}] 正在生成: {img_info['scene']}")
+            st = variant.get("style", "未指定")
-        print(f"  画风: {img_info.get('style', '未指定')}")
+            print(f"  [{v_label}] 画风: {st}")
-        print(f"  Prompt({prompt_lang}): {prompt[:120]}...")
+            prev = prompt[:120] + ("..." if len(prompt) > 120 else "")
            print(f"  Prompt({prompt_lang}): {prev}")
            if out_cfg.get("save_prompts", True):
                txt_lines.append(f"\n--- {v_label} | Style: {st} ---\n")
                if variant.get("style_rationale"):
                    txt_lines.append(f"Rationale: {variant['style_rationale']}\n")
                txt_lines.append(f"Prompt(zh): {variant.get('prompt', '')}\n")
                txt_lines.append(f"Prompt(en): {variant.get('prompt_en', '')}\n")
                txt_lines.append(f"Used({prompt_lang}): {prompt}\n")
            for j in range(images_per_prompt):
-            variant_offset = i * 100 + j
+                variant_offset = i * 100 + vi * 17 + j
                actual_seed = (seed + variant_offset) if seed >= 0 else (int(time.time() * 1000) % (2**32) + variant_offset)
                generator = create_generator(device, actual_seed)
-            suffix = chr(ord("a") + j) if images_per_prompt > 1 else ""
+                seed_suffix = chr(ord("a") + j) if images_per_prompt > 1 else ""
                if images_per_prompt > 1:
-                print(f"  --- 第 {j+1}/{images_per_prompt} 张 (seed={actual_seed}) ---")
+                    print(f"    --- 同画风第 {j+1}/{images_per_prompt} 张 (seed={actual_seed}) ---")
                start_time = time.time()
@@ -727,24 +753,14 @@ def generate_images(pipe, analysis: dict, cfg: dict) -> list[Path]:
                elapsed = time.time() - start_time
                print(f"    生成完成，耗时 {elapsed:.1f}s")
-            img_path = output_dir / f"{prefix}_{i:02d}{suffix}.png"
+                img_path = output_dir / f"{prefix}_{i:02d}_{v_label}{seed_suffix}.png"
                image.save(img_path)
                saved_paths.append(img_path)
                print(f"    已保存: {img_path}")
        if out_cfg.get("save_prompts", True):
            txt_path = output_dir / f"{prefix}_{i:02d}_prompt.txt"
-            prompt_zh = img_info["prompt"]
+            txt_path.write_text("".join(txt_lines), encoding="utf-8")
            prompt_en = img_info.get("prompt_en", "")
            txt_path.write_text(
                f"Scene: {img_info['scene']}\n"
                f"Style: {img_info.get('style', '')}\n"
                f"Description: {img_info['description']}\n"
                f"Prompt(zh): {prompt_zh}\n"
                f"Prompt(en): {prompt_en}\n"
                f"Used({prompt_lang}): {prompt}\n",
                encoding="utf-8",
            )
    return saved_paths
--- a/prompt_to_image.py
+++ b/prompt_to_image.py
@@ -0,0 +1,204 @@
 """
 直接使用用户输入的 prompt 调用本地 Z-Image-Turbo 出图。
 复用 config.yaml 中的 image / lora / output 等推理配置；
 不对 prompt 做 LLM 改写，不自动拼接 LoRA 触发词（触发词请自行写进 prompt）。
 """
 from __future__ import annotations
 import argparse
 import sys
 import time
 from datetime import datetime
 from pathlib import Path
 from poetry_to_image import (
    create_generator,
    load_config,
    load_pipeline,
    resolve_image_size,
 )
 def _read_prompt_from_file(path: Path) -> str:
    """按 UTF-8 原样读取文件，不做 strip 或换行规范化以外的解码。"""
    return path.read_bytes().decode("utf-8")
 def _collect_prompts(args: argparse.Namespace) -> list[str]:
    prompts: list[str] = []
    if args.prompts:
        prompts.extend(args.prompts)
    for fp in args.prompt_files or []:
        p = Path(fp)
        if not p.is_file():
            print(f"错误: 文件不存在: {p}", file=sys.stderr)
            sys.exit(1)
        prompts.append(_read_prompt_from_file(p))
    if not prompts and not sys.stdin.isatty():
        prompts.append(sys.stdin.buffer.read().decode("utf-8"))
    if not prompts and sys.stdin.isatty():
        print("请输入 prompt（空行结束）：")
        lines: list[str] = []
        while True:
            try:
                line = input()
            except EOFError:
                break
            if line == "":
                break
            lines.append(line)
        text = "\n".join(lines)
        if text:
            prompts.append(text)
    return prompts
 def _generate(
    pipe,
    prompt: str,
    *,
    cfg: dict,
    index: int,
    output_dir: Path,
    filename_prefix: str,
 ) -> list[Path]:
    """对单条 prompt 出图；prompt 字符串原样传入 pipeline。"""
    img_cfg = cfg["image"]
    out_cfg = cfg["output"]
    device = cfg.get("_resolved_device", "cpu")
    width, height = resolve_image_size(img_cfg)
    steps = img_cfg.get("num_inference_steps", 9)
    guidance = img_cfg.get("guidance_scale", 0.0)
    seed = img_cfg.get("seed", -1)
    images_per_prompt = max(1, min(10, img_cfg.get("images_per_prompt", 1)))
    saved: list[Path] = []
    for j in range(images_per_prompt):
        variant_offset = index * 100 + j
        if seed >= 0:
            actual_seed = seed + variant_offset
        else:
            actual_seed = int(time.time() * 1000) % (2**32) + variant_offset
        generator = create_generator(device, actual_seed)
        seed_suffix = chr(ord("a") + j) if images_per_prompt > 1 else ""
        if images_per_prompt > 1:
            print(f"  --- 第 {j + 1}/{images_per_prompt} 张 (seed={actual_seed}) ---")
        start = time.time()
        result = pipe(
            prompt=prompt,
            height=height,
            width=width,
            num_inference_steps=steps,
            guidance_scale=guidance,
            generator=generator,
        )
        image = result.images[0]
        elapsed = time.time() - start
        print(f"  生成完成，耗时 {elapsed:.1f}s")
        img_path = output_dir / f"{filename_prefix}_{index:02d}{seed_suffix}.png"
        image.save(img_path)
        saved.append(img_path)
        print(f"  已保存: {img_path}")
        if out_cfg.get("save_prompts", True):
            txt_path = output_dir / f"{filename_prefix}_{index:02d}{seed_suffix}_prompt.txt"
            txt_path.write_bytes(prompt.encode("utf-8"))
    return saved
 def main() -> None:
    parser = argparse.ArgumentParser(
        description="Z-Image-Turbo 直出图：使用用户给定 prompt，不做文本侧处理"
    )
    parser.add_argument(
        "-c", "--config",
        default="config.yaml",
        help="配置文件路径（默认: config.yaml）",
    )
    parser.add_argument(
        "-p", "--prompt",
        action="append",
        dest="prompts",
        metavar="TEXT",
        help="prompt 文本；可多次指定以连续生成多张不同 prompt",
    )
    parser.add_argument(
        "-f", "--file",
        action="append",
        dest="prompt_files",
        metavar="PATH",
        help="从 UTF-8 文件读取整段 prompt；可多次指定",
    )
    parser.add_argument(
        "-o", "--output",
        default=None,
        help="输出目录（默认: output 下按日期时间分子目录，与 poetry_to_image 一致）",
    )
    parser.add_argument(
        "--flat-output",
        action="store_true",
        help="将输出直接写入配置中的 output.dir，不再追加 日期/时间 子目录",
    )
    args = parser.parse_args()
    cfg = load_config(args.config)
    if args.output:
        cfg["output"]["dir"] = args.output
    elif not args.flat_output:
        base = Path(cfg["output"].get("dir", "./output"))
        now = datetime.now()
        cfg["output"]["dir"] = str(base / now.strftime("%Y-%m-%d") / now.strftime("%H-%M-%S"))
    prompts = _collect_prompts(args)
    if not prompts:
        print("未提供任何 prompt。", file=sys.stderr)
        sys.exit(1)
    for k, p in enumerate(prompts, 1):
        if not p:
            print(f"警告: 第 {k} 条 prompt 为空，已跳过。", file=sys.stderr)
    prompts = [p for p in prompts if p]
    if not prompts:
        sys.exit(1)
    out_dir = Path(cfg["output"].get("dir", "./output"))
    out_dir.mkdir(parents=True, exist_ok=True)
    prefix = cfg["output"].get("filename_prefix", "zimg")
    img_cfg = cfg["image"]
    width, height = resolve_image_size(img_cfg)
    preset = img_cfg.get("size_preset", "custom")
    print(f"\n输出目录: {out_dir.resolve()}")
    print(f"图片尺寸: {width}×{height}" + (f" (预设: {preset})" if preset != "custom" else ""))
    print(f"共 {len(prompts)} 条 prompt，将依次原样送模型推理。\n")
    pipe = load_pipeline(cfg)
    all_saved: list[Path] = []
    for i, prompt in enumerate(prompts, 1):
        preview = prompt if len(prompt) <= 160 else prompt[:160] + "..."
        print(f"[{i}/{len(prompts)}] Prompt:\n{preview}\n")
        all_saved.extend(
            _generate(
                pipe,
                prompt,
                cfg=cfg,
                index=i,
                output_dir=out_dir,
                filename_prefix=prefix,
            )
        )
    print(f"\n全部完成，共 {len(all_saved)} 个文件。")
 if __name__ == "__main__":
    main()