Validating HTTP(S) URLs in Ruby on Rails
By Martijn Storck
This post demonstrates a way to validate URLs in ActiveRecord objects
that goes beyond a simple regular expression and is both safe and configurable. The
goal is to validate that a social media URL in a user profile is a valid https://
one,
that links to a given domain to ensure users can only post a link to the domain that
the field is intended for. Let’s explore why a regular expression is a bad idea.
The naive approach
In Rails, Active Record Validation Helpers are used to validate the state of objects before they are persisted to the database to maintain integrity and correctness of data. Rails comes with a decent selection of Validation Helpers to allow for validation of, for example, presence, format and uniqueness of attributes. These helpers can be used to generate a naive validation using a regular expression. Something like:
class User < ApplicationRecord
validates :linkedin_url, format: %r{\Ahttp(s)://.*?\.linkedin\.com/}
end
Unfortunately, Xkcd 1171 applies, as always. It turns out users can easily exploit this
to insert malicious URLs into our database, as long as they end with linkedin.com
. Of course
a more complex regular expression can be made up to combat this, but why go through the trouble?
The Ruby standard library comes with an excellent URI parser already, accessible through
URI.parse. The source code of the actual parser comes in at 120 lines, which confirms
that a simple regular expression won’t cut it here.
A better URL validator for Rails
As you would expect, Rails provides an excellent API to create custom validations. We will use this to build a URL validator that:
-
Relies on Ruby’s built in RFC3986 Parser
-
Enables host/domain allow-listing
-
Uses regular expressions only when appropriate
The validator code below can be placed in app/validators/url_validator.rb
:
|
|
The validator is based on ActiveModel::EachValidator, which is the base class for all built-in Active Model validators. This allows the validator to be instantiated as follows in the model:
class User < ApplicationRecord
validates :linkedin_url, url: { host: /linkedin\.com\Z/ }
end
The url:
option makes Rails look for the UrlValidator
class. The validator parses the URL and, if successful, checks the scheme
and host
attributes to ensure the URL is valid
for the intended purpose. If this is not the case, it adds an error to the attribute. These errors are
translated in the activerecord.errors.models.user.(https_only|host_not_allowed|invalid)
I18n keys.
If only validation for a valid HTTPS URL is required, without the allow-list, url: true
can be used instead
of the options hash.
The case for regular expressions
There is some irony in using regular expressions right after referring to that Xkcd comic, but it is a good option here for the flexibility offered:
- check match for a domain:
/example\.org\Z/i
- check exact match for the host name:
/\Awww\.example\.org\Z/
. - match multiple domains:
/(youtube\.com|youtu\.be)\Z/
.
In order to implement these options without regular expressions, multiple options would be required along
with support for both strings and arrays of strings as parameters. Based on that it can be stated
that regular expressions have merit here. However, with the current code,
there is a risk that the user forgets an end-of-line anchor (\z
or \Z
) leading to new possible
exploits.
So even with this improved Url Validator, Xkcd 1711 looms. Be vigilant!
Bedrijf foto gemaakt door 8photo - nl.freepik.com