HTML URL Encoding

What is URL Encoding

According to RFC 3986, the characters in a URL only limited to a defined set of reserved and unreserved US-ASCII characters. Any other characters are not allowed in a URL. But URL often contains characters outside the US-ASCII character set, so they must be converted to a valid US-ASCII format for worldwide interoperability. URL-encoding, also known as percent-encoding is a process of encoding URL information so that it can be safely transmitted over the internet.

To map the wide range of characters that is used worldwide, a two-step process is used:

  • At first the data is encoded according to the UTF-8 character encoding.
  • Then only those bytes that do not correspond to characters in the unreserved set should be percent-encoded like %HH, where HH is the hexadecimal value of the byte.

For example, the string: François would be encoded as: Fran%C3%A7ois

Ç, ç (c-cedilla) is a Latin script letter.


Reserved Characters

Certain characters are reserved or restricted from use in a URL because they may (or may not) be defined as delimiters by the generic syntax in a particular URL scheme. For example, forward slash / characters are used to separate different parts of a URL.

If data for a URL component contains character that would conflict with a reserved set of characters, which is defined as a delimiter in the URL scheme then the conflicting character must be percent-encoded before the URL is formed. Reserved characters in a URL are:

!#$&'()*+,/:;=?@[]
%21%23%24%26%27%28%29%2A%2B%2C%2F%3A%3B%3D%3F%40%5B%5D

Unreserved Characters

Characters that are allowed in a URL but do not have a reserved purpose are called unreserved. These include uppercase and lowercase letters, decimal digits, hyphen, period, underscore, and tilde. The following table lists all the unreserved characters in a URL:

ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
0123456789-_.~

URL Encoding Converter

The following converter encodes and decodes the characters according to RFC 3986.

 

Enter some character and click on encode or decode button to see the output.