Zombie driver fails when url contains "high bytes", non-ascii characters. The following example contains a valid Hungarian with accented characters.
https://hu.wikipedia.org/wiki/Műemlék
Desktop browsers and Mink Goutte driver translate the high bytes correctly:
https://hu.wikipedia.org/wiki/M%C5%B1eml%C3%A9k
Zombie driver sends string as-is to javascript, then bytes above 0x7f go wrong somewhere in Zombie:
https://hu.wikipedia.org/wiki/Mqeml\xe9k
It's a bit strange how characters are truncated:
- letter
é becomes \xe9 that is character code in ISO-8859-1
- letter
ű becomes q because this character does not exists in that code page
Characters that don't exist in ISO-8859-1 encoding are represented with regular letters, for example q, damage is irreversible.
Example shows that desktop browsers translate non-asci characters to percent-encoded bytes using their UTF-8 character codes:
- letter
é becomes %C3%A9
- letter
ű becomes %C5%B1
That's correct, web servers expect urls in this way.
Zombie driver fails when url contains "high bytes", non-ascii characters. The following example contains a valid Hungarian with accented characters.
Desktop browsers and Mink Goutte driver translate the high bytes correctly:
Zombie driver sends string as-is to javascript, then bytes above
0x7fgo wrong somewhere in Zombie:It's a bit strange how characters are truncated:
ébecomes\xe9that is character code inISO-8859-1űbecomesqbecause this character does not exists in that code pageCharacters that don't exist in
ISO-8859-1encoding are represented with regular letters, for exampleq, damage is irreversible.Example shows that desktop browsers translate non-asci characters to percent-encoded bytes using their UTF-8 character codes:
ébecomes%C3%A9űbecomes%C5%B1That's correct, web servers expect urls in this way.