Conversation

iEgit

reopened from #4125

This PR improves the JSON parsing logic in StructuredOutputParser.ts to handle malformed JSON responses from LLMs more effectively.

Changes:
• Fixed JSON extraction logic:
• Previously, JSON extraction relied on splitting text using code block markers (`json).
• Now, it uses a regex (/(?:^[^{[])|(?:[^}\]]$)/g) to remove any leading or trailing non-JSON content, ensuring a more robust parsing approach.

Why this change?
• The previous approach failed in some edge cases where JSON responses did not follow the expected format.
• The new regex-based approach ensures that only valid JSON content is passed to jsonrepair, reducing the likelihood of parsing errors.

Testing:
• Manually tested with different malformed JSON outputs to verify correct parsing.
• Ensured existing functionality remains intact.

@iEgitiEgit changed the title enhance json parses enhanced json parsing Mar 12, 2025
@HenryHengZJ

Do you have some examples where the old approach is not able to parse, but solvable with the newer approach?

@iEgit

yes, basically it breaks if llm outputs something like this:

sure, here's you json output:

{
  "result": "string"

in our case it was even worse - we provided few shot examples and it repeated part of the prompt and only then provided the json response

@iEgitiEgit force-pushed the enhance-json-parsing branch from 0c0e9d0 to e28224b Compare April 7, 2025 15:24
Sign up for free to join this conversation on . Already have an account? Sign in to comment
None yet
None yet

Successfully merging this pull request may close these issues.