UTF-8 Encoding for Older Browsers

In working on a web-based application that needed to support Netscape Communicator 4.x+ and Microsoft Internet Explorer 5.x+, I discovered that the older versions of these browsers had poor support for UTF8 encoding. I needed to find a way to make form field entries URL-safe and also needed to support multiple languages. The JavaScript escape() function fixes ASCII characters that are not valid for use in URLs, but does not handle unicode characters well. To make matters worse, there were browser incompatibilities: using escape() in IE would generate a new string that looked like %unnnn, where n is a hexadecimal digit. The correct encoding should follow RFC 2279 and be a set of hexadecimal digit pairs like %nn%nn. Netscape 4 would just treat the characters as ASCII, which would result in lost accents and umlauts.

The encodeURIComponent() function introduced in IE5.5, Netscape 6, and Mozilla does exactly what is needed. However, since the function is unavailable in Netscape 4.x and IE5, a different solution is needed. All JavaScript strings are unicode, so I expected that it would be possible to properly encode them. Thankfully, someone saw my plea for help and sent me some helpful example code.

Demo

The following form uses either the built-in browser function encodeURIComponent() or a created one encodeURIComponentNew() to properly escape characters for use in the URL.

The encodeURIComponentNew() function always encodes to UTF-8, regardless of the charset specified by the page. This may be desirable behavior, even for browsers that have a built-in function.