Url Encoding
URLs are complex datatypes that carry a lot of information and can be a bit hard to handle sometimes. In this article we will try to explain how to deal with some tricky but not uncommon cases. All fields that are of type URL such as the link field have both valid examples and invalid examples of URLs in their respective sections.
The image above is borrowed from Mozilla MDN1.
Rules
Before we explain how to encode URLs, here is what we expect regarding urls:
- They must follow RFC39862
- They must be absolute (we do not accept relative urls)
- They must have a
scheme
(http/https) - They must have an
authority
(your domain usually) - They must have a
path
(a single product or image are never placed in the root of a domain) - They may have
parameters
but the parameters values must be properly URL encoded - They may have
anchors
(also called fragment) but the anchors must be properly URL encoded
Encoding
In order to make a long RFC specification short, do this when passing urls to us:
In order to test conversion and encoding you can use one of the many available online tools, one such example is Coder´s Toolbox for encoding URL values and the Internationalized Domain Name (IDN) Conversion Tool for converting domain names.
- Ensure that
domain
contain only ascii characters, unicode characters should be encoded3 - Ensure that all
path
sections contain only ascii characters, unicode characters should be encoded4 - Ensure that all
parameter
keys contain only ascii characters, unicode characters should be encoded4 - Ensure that all
parameter
values contain only ascii characters, unicode characters should be encoded4 - Ensure that all
anchor
values contain only ascii characters, unicode characters should be encoded4
Example
If we look at a product url like this
https://mittföretag.com/categories/överlevnadsutrustning/super ficklampa?strength=extra-bright!#buy—now
We would expect it in the format of
https://xn--mittfretag-icb.com/categories/%C3%B6verlevnadsutrustning/super%20ficklampa?strength=extra-bright!#buy%E2%80%94now
Breakdown
In the example above the following encodings have taken place.
Domain name
mittföretag.com
to xn--mittfretag-icb.com
According to IDN3
Path
/categories/överlevnadsutrustning/super ficklampa
to /categories/%C3%B6verlevnadsutrustning/super%20ficklampa
according to percent encoding4
Parameter values
extra-bright!
to extra-bright!
since here is nothing to do, all characters are already plain ascii