Skip to content

Conversation

dwisiswant0
Copy link
Member

@dwisiswant0 dwisiswant0 commented Sep 17, 2025

Function signature:

-	html_escape(arg1 interface{}) interface{}
+	html_escape(s string, optionalConvertAllChars bool) string

Parameters

  • s (string): The input string to be HTML escaped
  • optionalConvertAllChars (bool, optional): When true, converts all characters to numeric entities; when false or omitted, performs HTML escaping with wider range of HTML characters than the standard html.EscapeString function. It uses the full entity map from W3C to convert characters like "é" to "é" and "α" to "α". It also escapes basic HTML characters like "<" to "<". EscapeString is the inverse of UnescapeString, meaning that html_unescape(html_escape(s, true)) == s always holds.

Closes #259
Blocked by projectdiscovery/utils#681

TODO:

  • Bump github.com/projectdiscovery/utils.
  • Remove replace directive.
$ go test -run "^TestDslExpressions/html_(un|escape)" -v
=== RUN   TestDslExpressions
=== RUN   TestDslExpressions/html_escape("<body>test</body>")
html_escape("<body>test</body>"): 	 &lt;body&gt;test&lt;&sol;body&gt;
=== RUN   TestDslExpressions/html_unescape("&lt;&#98;&#111;&#100;&#121;&gt;&#116;&#101;&#115;&#116;&lt;&sol;&#98;&#111;&#100;&#121;&gt;")
html_unescape("&lt;&#98;&#111;&#100;&#121;&gt;&#116;&#101;&#115;&#116;&lt;&sol;&#98;&#111;&#100;&#121;&gt;"): 	 <body>test</body>
=== RUN   TestDslExpressions/html_unescape("&#x3c;&#x62;&#x6f;&#x64;&#x79;&#x3e;&#x74;&#x65;&#x73;&#x74;&#x3c;&#x2f;&#x62;&#x6f;&#x64;&#x79;&#x3e;")
html_unescape("&#x3c;&#x62;&#x6f;&#x64;&#x79;&#x3e;&#x74;&#x65;&#x73;&#x74;&#x3c;&#x2f;&#x62;&#x6f;&#x64;&#x79;&#x3e;"): 	 <body>test</body>
=== RUN   TestDslExpressions/html_unescape("&lt;body&gt;test&lt;/body&gt;")
html_unescape("&lt;body&gt;test&lt;/body&gt;"): 	 <body>test</body>
=== RUN   TestDslExpressions/html_escape("<body>test</body>",_true)
html_escape("<body>test</body>", true): 	 &lt;&#98;&#111;&#100;&#121;&gt;&#116;&#101;&#115;&#116;&lt;&sol;&#98;&#111;&#100;&#121;&gt;
--- PASS: TestDslExpressions (0.00s)
    --- PASS: TestDslExpressions/html_escape("<body>test</body>") (0.00s)
    --- PASS: TestDslExpressions/html_unescape("&lt;&#98;&#111;&#100;&#121;&gt;&#116;&#101;&#115;&#116;&lt;&sol;&#98;&#111;&#100;&#121;&gt;") (0.00s)
    --- PASS: TestDslExpressions/html_unescape("&#x3c;&#x62;&#x6f;&#x64;&#x79;&#x3e;&#x74;&#x65;&#x73;&#x74;&#x3c;&#x2f;&#x62;&#x6f;&#x64;&#x79;&#x3e;") (0.00s)
    --- PASS: TestDslExpressions/html_unescape("&lt;body&gt;test&lt;/body&gt;") (0.00s)
    --- PASS: TestDslExpressions/html_escape("<body>test</body>",_true) (0.00s)
PASS
ok  	github.com/projectdiscovery/dsl	0.013s

Copy link
Member

@Mzack9999 Mzack9999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implementation: lgtm

Todo:

@dwisiswant0 dwisiswant0 force-pushed the dwisiswant0/refactor/adds-optionalConvertAllChars-for-html-escape branch from 1242280 to 777c0f1 Compare October 2, 2025 13:20
@dwisiswant0 dwisiswant0 marked this pull request as ready for review October 2, 2025 13:21
@dwisiswant0 dwisiswant0 requested a review from Mzack9999 October 2, 2025 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add HTML encoding to each character of the string
2 participants