Skip to main content

Url Encoding

What Is An Url?

URLs are complex datatypes that carry a lot of information and can be a bit hard to handle sometimes. In this article we will try to explain how to deal with some tricky but not uncommon cases. All fields that are of type URL such as the link field have both valid examples and invalid examples of URLs in their respective sections.

The image above is borrowed from Mozilla MDN1.

Rules

Before we explain how to encode URLs, here is what we expect regarding urls:

  • They must follow RFC39862
  • They must be absolute (we do not accept relative urls)
  • They must have a scheme (http/https)
  • They must have an authority (your domain usually)
  • They must have a path (a single product or image are never placed in the root of a domain)
  • They may have parameters but the parameters values must be properly URL encoded
  • They may have anchors (also called fragment) but the anchors must be properly URL encoded

Encoding

In order to make a long RFC specification short, do this when passing urls to us:

TIP: Use an encoding tool

In order to test conversion and encoding you can use one of the many available online tools, one such example is Coder´s Toolbox for encoding URL values and the Internationalized Domain Name (IDN) Conversion Tool for converting domain names.

  • Ensure that domain contain only ascii characters, unicode characters should be encoded3
  • Ensure that all path sections contain only ascii characters, unicode characters should be encoded4
  • Ensure that all parameter keys contain only ascii characters, unicode characters should be encoded4
  • Ensure that all parameter values contain only ascii characters, unicode characters should be encoded4
  • Ensure that all anchor values contain only ascii characters, unicode characters should be encoded4

Example

If we look at a product url like this

https://mittföretag.com/categories/överlevnadsutrustning/super ficklampa?strength=extra-bright!#buy—now

We would expect it in the format of

https://xn--mittfretag-icb.com/categories/%C3%B6verlevnadsutrustning/super%20ficklampa?strength=extra-bright!#buy%E2%80%94now

Breakdown

In the example above the following encodings have taken place.

Domain name

mittföretag.com to xn--mittfretag-icb.com According to IDN3

Path

/categories/överlevnadsutrustning/super ficklampa to /categories/%C3%B6verlevnadsutrustning/super%20ficklampa according to percent encoding4

Parameter values

extra-bright! to extra-bright! since here is nothing to do, all characters are already plain ascii

Do not encode parameter separators

If a questionmark ? or an ampersand & are meant to separate parameters, then they should not be encoded. But if they are a part of the value or the parameter name, then they should be encoded. Please compare:

?movietitle=Godfather&rating=5 

# should be unchanged when encoded

?movietitle=Godfather&rating=5

vs

?movietitle=Tom & Jerry&rating=5

# the ampersand in Tom & Jerry should be encoded, but not the ampersand separating the rating parameter

?movietitle=Tom%20%26%20Jerry&rating=5

It is a common mistake to simply encode the whole query parameter string. Instead you should encode the parameter names and parameter values separately before combining them.

Anchor

buy—now to buy%E2%80%94now according to percent encoding, the dash is a unicode character U+2014 called emdash4

Invisible characters and dashes are hard

Unicode contains a lot of different representations of whitespaces, dashes and other hard-to-see-the-difference characters. But if you do url encoding programatically of everything you will ensure that things work properly.

If you suspect that any of these characters exist in your url then paste it in an encoding tool like Coder´s Toolbox and see if it will be percent encoded.

Why So Strict?

The URLs passed in for your products will be sent to a lot of systems (ours, partners, customer devices). Some of these systems are more liberal than others in accepting unicode characters and other symbols directly (browsers are very liberal), but some are not (for example older phones). We don´t want to break the experience for any customer who is using our product in order to find your products and make a purchase. Hence we require well formed URLs that will work across the broadest range of devices and systems.

References

Footnotes

  1. What is an url? by Mozilla Contributors is licensed under CC-BY-SA 2.5

  2. RFC3986

  3. Internationalizing Domain Names In Applications 2

  4. WikiPedia: Percent Encoding 2 3 4 5 6