清理模型返回的markdown标记 by MoeclubM · Pull Request #72 · 666ghj/MiroFish

MoeclubM · 2026-02-15T05:27:05Z

生成本体结构时部分模型（已测试minimax-m2.1 minimax-m2.5 glm-4.7 glm-5都有相同情况）似乎不遵守json_object的格式，会返回markdown包裹的json代码块导致json.loads()解析错误

对md代码块标记进行了清理并额外增加了try except防止出错导致500

#64
#58
#48
不确定是否都是这个原因导致的

Clean up LLM responses by removing Markdown code fences (```json and ```), trimming whitespace, and then parsing the cleaned string as JSON. If json.loads fails, raise a ValueError with the invalid payload. This prevents crashes when the model wraps JSON in code blocks instead of returning raw JSON (previously returned json.loads(response) directly).

Copilot

Pull request overview

该 PR 旨在解决部分大模型在 response_format={"type":"json_object"} 下仍返回被 Markdown 代码块包裹的 JSON，导致后端 json.loads() 解析失败并触发 500 的问题。

Changes:

在 chat_json 中增加对 Markdown 代码块围栏（/json）的清理逻辑
为 JSON 解析增加 try/except，将解析失败转换为更明确的异常

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-15T05:29:54Z

backend/app/utils/llm_client.py

+        except json.JSONDecodeError:
+            raise ValueError(f"LLM返回的JSON格式无效: {cleaned_response}")


这里的异常信息把 cleaned_response 原文完整拼进了 ValueError。由于上层 API（例如图谱相关接口）会把 str(e)/traceback 返回给前端并记录日志，这会把模型返回内容（可能包含用户上传文档片段/敏感信息）直接泄露且可能导致响应体过大。建议只返回截断后的预览（例如前 N 字符）并把完整内容仅写入服务端日志或附加到内部调试字段。

Copilot · 2026-02-15T05:29:54Z

backend/app/utils/llm_client.py

+            return json.loads(cleaned_response)
+        except json.JSONDecodeError:
+            raise ValueError(f"LLM返回的JSON格式无效: {cleaned_response}")


chat_json 的返回类型标注为 Dict[str, Any]，但 json.loads 可能解析出 list/str/number 等非对象类型；而现有调用方普遍使用 .get(...)，这会在运行时崩溃。建议在 loads 后校验解析结果必须是 dict，否则抛出更明确的异常。

Suggested change

return json.loads(cleaned_response)

except json.JSONDecodeError:

raise ValueError(f"LLM返回的JSON格式无效: {cleaned_response}")

data = json.loads(cleaned_response)

except json.JSONDecodeError:

raise ValueError(f"LLM返回的JSON格式无效: {cleaned_response}")

if not isinstance(data, dict):

raise ValueError(f"LLM返回的JSON必须是对象类型，但实际为: {cleaned_response}")

return data

Copilot · 2026-02-15T05:29:55Z

backend/app/utils/llm_client.py

+        except json.JSONDecodeError:
+            raise ValueError(f"LLM返回的JSON格式无效: {cleaned_response}")


当前捕获 json.JSONDecodeError 后丢弃了原始异常的位置信息/原因，排查格式问题会更困难。建议 except json.JSONDecodeError as e，并在新的异常里包含 e.msg/e.lineno/e.colno 等关键信息，同时使用异常链（raise ... from e）保留堆栈。

Suggested change

except json.JSONDecodeError:

raise ValueError(f"LLM返回的JSON格式无效: {cleaned_response}")

except json.JSONDecodeError as e:

raise ValueError(

f"LLM返回的JSON格式无效: {cleaned_response} "

f"(错误: {e.msg}, 行: {e.lineno}, 列: {e.colno})"

) from e

666ghj · 2026-03-05T16:07:48Z

经过测试好像这样依旧会500，你调试的时候是可以解决问题的吗？

666ghj · 2026-03-05T16:19:41Z

我找到原因了，

MiniMax M2.5 是推理模型，即使通过 OpenAI 兼容 API 调用，其 content 字段会包含 ... 思维链内容。实际返回大致如下：

  <think>
  用户需要我生成一个本体结构...让我分析这些文档...
  </think>
  ```json
  {"entity_types": [...], "edge_types": [...]}

而 chat_json() 直接对整个 content 做 json.loads()，前面的 <think> 内容导致解析失败抛出 JSONDecodeError，上层没有兜底，最终返回 500。

OpenAI 的推理模型（如 o1、o3）把思考内容和最终回复分开存放在不同字段：

思考过程 → reasoning_content（独立字段）
最终回复 → content（干净的，只有结果）

而 MiniMax M2.5 通过 OpenAI 兼容 API 调用时，默认把思考内容直接塞进 content 字段，用标签包裹和正文混在一起。

MiniMax 文档里也提到了这一点：原生的 OpenAI API 的 MiniMax-M2.5 模型 content 字段会包含标签内容

最新的commit代码已经修复

Some models (MiniMax M2.5, GLM-4.7, GLM-5) don't respect json_object format and return markdown-wrapped JSON, causing json.loads() to fail. Changes: - Add json_utils.py with clean_llm_json_response(), parse_llm_json() - Auto-strip markdown code blocks (```json ... ```) - Provide safe_parse_llm_json() for non-throwing usage - Update llm_client.py to use new helper Related: 666ghj#72, 666ghj#64, 666ghj#58, 666ghj#48

Gresdy · 2026-03-10T02:58:06Z

最新的代码，今天使用minimax2.5还是存在500错误

[backend] [02:52:07] INFO: 调用 LLM 生成本体定义...
[backend] 192.168.65.1 - - [10/Mar/2026 02:52:09] "POST /api/graph/ontology/generate HTTP/1.1" 500 -

Copilot AI review requested due to automatic review settings February 15, 2026 05:27

dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Feb 15, 2026

Copilot started reviewing on behalf of MoeclubM February 15, 2026 05:27 View session

Copilot AI reviewed Feb 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

清理模型返回的markdown标记#72

清理模型返回的markdown标记#72
MoeclubM wants to merge 1 commit into666ghj:mainfrom
MoeclubM:main

MoeclubM commented Feb 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

666ghj commented Mar 5, 2026

Uh oh!

666ghj commented Mar 5, 2026

Uh oh!

Gresdy commented Mar 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		except json.JSONDecodeError:
		raise ValueError(f"LLM返回的JSON格式无效: {cleaned_response}") No newline at end of file

-        except json.JSONDecodeError:
-            raise ValueError(f"LLM返回的JSON格式无效: {cleaned_response}")
+        except json.JSONDecodeError as e:
+            raise ValueError(
+                f"LLM返回的JSON格式无效: {cleaned_response} "
+                f"(错误: {e.msg}, 行: {e.lineno}, 列: {e.colno})"
+            ) from e

Conversation

MoeclubM commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

666ghj commented Mar 5, 2026

Uh oh!

666ghj commented Mar 5, 2026

Uh oh!

Gresdy commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MoeclubM commented Feb 15, 2026 •

edited

Loading

Gresdy commented Mar 10, 2026 •

edited

Loading