Considering the number of URL Shortening services, I decided to make a short review of the subject. URL Shorteners were made popular by micro-blog services which used limited number of characters for a message e.g. Twitter.
So, here’s what’s covered:
Update (2011-07-29):
Reorganized code. It’s now divided into classes, easier for understanding and reuse. Using PDO for database connection instead of mysql methods. Added QR Code for generated shortlinks. This tutorial won’t change anymore, since it’s on the basic level it needs to be. For more information, you can follow the URL Shortening project I started, named BooRL.
URL Redirection
There are many uses for URL redirection. Whether it is because of moving to a new domain, logging links, manipulating search engines or just buying similar domain names in case of typos. It is important to have SEO (Search Engine Optimization) in mind when doing this. There are several ways for doing it:
Javascript Redirect
You can redirect visitors using Javascript code within HTML code. Search engines don’t like this method. Imagine you have a indexed page about something. It would be easy to add the javascript code later to redirect to another website (Viagra, Casino, Poker…).
<script type="text/javascript"> window.location = "http://www.google.com"; </script>
Parked Domains
You could park an additional domain, and point it’s DNS server to your original site hosting. Whenever someone types the additional domain, your main site is opened. This is also not recommended, because search engines penalize duplicate content. Another bad thing is that every domain will get it’s own page ranking, instead of ranking only the original website.
HTTP Redirects
In the HTTP protocol used by the World Wide Web, a redirect is a response with a status code beginning with 3 that induces a browser to go to another location.
- 300 multiple choices (e.g. offer different languages)
- 301 moved permanently
- 302 found (originally temporary redirect, but now commonly used to specify redirection for unspecified reason)
- 303 see other (e.g. for results of cgi-scripts)
- 304 not modified
- 305 use proxy
- 307 temporary redirect
Out of these, 301 and 302 are commonly used. Well, 302 is (at least what I know) handled well by Google, which means it will transfer your page ranking to the redirected page, but other search engines won’t be so nice.
So, here we are, left with the only right way in SEO perspective, the 301 redirect.
Model
We’re gonna use HTTP 301 redirect. Let’s discuss the model. Every URL Shortener service has a domain and a key. For example http://goo.gl/4uHOQ has:
- domain: goo.gl
- key: 4uHOQ
The key is different for every URL. If you are making a simple URL redirection service, you can return the same key for the same long URL. If you want to add URL logging for each user, with some analytics, this won’t work, but we’re gonna stick to the simple model, for the sake of understanding.
We’re gonna store each URL in a database. We need one (database) table:
id | long_url |
---|---|
1 | http://www.codeden.net/ |
2 | http://www.codeden.net/2011/05/skype-login-window-disappears/ |
3 | http://www.codeden.net/2011/05/extracting-pixel-values-from-videos-using-opencv/ |
Now we have a not-so-bad mapping of long URLs into numbers. This way, we can represent, using 8 character key, 108 long urls.. It is clear that this isn’t the best choice, and that a good URL shortening service would hit the cap in only a few days. But let’s say we map id to another system of enumeration. The id is in a system with base 10, which means we can use 10 numbers in 1 spot. But, if we can somehow map it into a system with a larger base e.g. 64, we’re gonna have 64 numbers in one spot, and the total number of length 8 keys is 648.
Implementation
Let’s put all this together using PHP. First, we need a way to do a 301 redirect:
header('HTTP/1.1 301 Moved Permanently'); header('Location: ' . $url); exit();
where $url is the webpage we are redirecting to.
Then we need a number converter (more a mapping) I posted a few weeks earlier.
We’re going to extend our database table, and keep the mapped value together with the long URL instead of decoding it to decimal and looking for it since storage space is a lot cheaper than processing power.
id | short_url | long_url |
---|---|---|
1 | A | http://www.codeden.net/ |
2 | B | http://www.codeden.net/2011/05/skype-login-window-disappears/ |
3 | C | http://www.codeden.net/2011/05/extracting-pixel-values-from-videos-using-opencv/ |
... | ... | ... |
129231 | eiO | http://www.codeden.net/category/random/ |
SQL for the database:
CREATE DATABASE IF NOT EXISTS `shortener`; CREATE TABLE IF NOT EXISTS `shortener`.`mapping` ( `id` INT NOT NULL AUTO_INCREMENT , `short_code` VARCHAR(10) NOT NULL , `long_url` TEXT NOT NULL , `insert_date` DATETIME NULL , PRIMARY KEY (`id`) , INDEX `short_code` (`short_code` ASC) , INDEX `long_url` (`long_url`(20) ASC) ) ENGINE = MyISAM DEFAULT CHARACTER SET = utf8 COLLATE = utf8_general_ci;
Indexes on short_code and long_url are for faster select queries. MyISAM is chosen because there are no referential integrities and it is fast.
When someone goes to www.sho.rt/someKey we should lookup in the database for short_code: someKey, and load the long_url into $url in the PHP redirect page.
The easiest way is to redirect everything to a single page, which will extract the key, ask the database for long_url and run the redirect.
Already wrote how to set up Apache HTTP server and mod_rewrite (which comes bundled with it) and redirect all requests to a single page.
URL Shortener class:
// Number converter include('converter.php'); /** * URL Shortener class * */ class Shortener { // Database holder private $database; // Short Code Regular Expression private $keyRegex = "/[^A-Za-z0-9\+\=]/"; // URL Regular Expression private $urlRegex = '/^(https?|ftp):\/\/[A-Za-z0-9_\-]+(\.[A-Za-z0-9_\-]+)+(\/|(\/[A-Za-z0-9_\-\?\+\=\&\.]+)+)?\/?$/'; // Connect to database on construction public function __construct() { $this->database = new PDO('mysql:host=localhost;dbname=shortener', 'root', ''); } /** * Get long URL for given key * * @param key - the long URL short code * @return Long URL */ public function getLongURL($key) { // Validate the key to contain only characters used in the mapping if (preg_match($this->keyRegex, $key)) { throw new Exception("Key contains chracters that are not allowed!"); } // Search for the key in the database $result = $this->database->query("SELECT long_url FROM mapping WHERE BINARY short_code = '" . $key . "'")->fetchAll(); if (sizeof($result) == 1) { $url = $result[0]['long_url']; } else { throw new Exception("Key invalid!"); } return $url; } /** * Shortens an URL if it doesn't exist, otherwise returns short code * * @param url - URL to be shortened * @return Short Code for given URL */ public function insertNewURL($url) { // Check if form was submitted and add the URL to the database if it doesn't exist, // otherwise return the shortcode of the long_url // Validate entered URL if (!preg_match($this->urlRegex, $url)) { throw new Exception('You have entered an invalid URL.'); } /* * This is a potential pitfall, since it can be misused * for attacking a HTTP server (Denial-of-Service) */ // Check if the URL exists if (!get_headers($url)) { throw new Exception('URL doesn\'t exist.'); } $url = addslashes($url); $this->database->beginTransaction(); // Search for the url in the database $result = $this->database->query("SELECT short_code FROM mapping WHERE " . " BINARY long_url = '" . $url . "'")->fetchAll(); // If found, return short url if (sizeof($result) == 1) { $short = $result[0]['short_code']; } else { // Create a new short_code for the given URL $result = $this->database->query("SELECT COALESCE(MAX(id + 1), 1) FROM mapping")->fetchAll(); if (sizeof($result) == 1) { $id = intval($result[0][0]); } else { $this->database->rollback(); throw new Exception('Can\'t get id.'); } // Convert the ID to a new base $short = NumberConverter::fromDecimalToBase($id, 64); // Insert the new URL data into the database $result = $this->database->exec("INSERT INTO mapping (id, short_code, long_url, insert_date)" . " VALUES ($id, '$short', '$url', CURDATE())"); if ($result) { $this->database->commit(); } else { $this->database->rollBack(); throw new Exception('Could not add the URL.'); } } return $short; } }
Not to forget that also an interface for adding new URLs to the database is needed. On top of it, show QR codes for shortened URLs using the QR class.
Here’s the PHP for index.php put together:
// URL Shortener class include("shortener.php"); // QR Class include("qr.php"); // Instantiate Shortener $shortener = new Shortener(); // Extract the key $key = split("/", $_SERVER['REQUEST_URI']); $key = $key[sizeof($key) - 1]; try { // If there is a key supplied, try to find it in the database, // otherwise show the page for shortening URLs if (strlen($key) > 0) { // Get Long URL for the given key $url = $shortener->getLongURL($key); // Redirect header('HTTP/1.1 301 Moved Permanently'); header('Location: ' . $url); exit(); } else { // Show a simple form with URL field and a button echo '<html>'; echo '<head>'; echo '<title>URL Shortener</title>'; echo '</head>'; echo '<body>'; echo '<form name="input" action="" method="post">'; echo 'URL: <input type="text" name="url" /> <input type="submit"'; echo 'value="Shorten" />'; echo '</form>'; $url = $_POST['url']; // Check if form was submitted and add the URL to the database if it doesn't exist, // otherwise return the shortcode of the long_url if (isset($url) && strlen($url) > 0) { // Get domain $domain = $_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI']; // Create new short code or get old if it already exists $shortCodeURL = $domain . $shortener->insertNewURL($url); // Show the shortcode echo $shortCodeURL . '<br />'; // Get QR Image for the generated code $qr = QR::getQRforURL($shortCodeURL, 200); // Show image echo '<img src="' . $qr . '" alt="QR" />'; } echo '</body>'; echo '</html>'; } } catch (Exception $e) { // Catch exceptions if any arise and show a message echo "Error! " . $e->getMessage(); die(); }
Pitfalls and misuse
As said earlier, it is a very simple shortener. Here are a few things to have in mind if you are going to make a commercial shortener:
- Get a short domain, not over 5-6 characters including the dot (.)
- If using PHP, have in mind that the maximum integer value is 2147483647 or 2*2147483647 in case of unsigned integer
- Need a more complex database (users, link analytics)
- A way to deal with decayed links i.e. links that haven’t been used for some time
Misuse
When URL Shorteners appeared they were often used to disguise an underlying address. You have no idea where a short link leads until you click it. Popular services like Facebook and Digg added a prefetch operation which shows some of the links contents, so you can’t get fooled.
That’s all and thanks for reading folks.
The mysql extension is out of date and on its way to deprecation. PDO with prepared statements can be easier, safer and more performant.
The singleton pattern isn’t advantageous here and complicates unit testing.
Thanks for the feedback. Will try to update it as soon as possible.
Hello There. I found your blog using msn. This is an extremely well written article. I will be sure to bookmark it and return to read more of your useful info. Thanks for the post. I will definitely comeback.
Thanks for the share!
Nancy.R
Great subject matter. I’ve discovered a good deal something totally new the following. Keep going.
Nice post 🙂 Foken refs
Useful thoughts will put these into practice now.
Great feature, I genuinely benefited from reading it, keep doing all the good thoughts.
My wife could have been searching for this particular almost everywhere. Thanks.
I have been trying to find this almost everywhere. Many thanks.
Pleasant article, I actually had a good time studying it, keep doing all the good efforts.
Great post. I just stumbled upon your blog and wanted to say that I have really enjoyed browsing your blog posts. In any case I’ll be subscribing to your feed and I hope you write again soon!
Appreciate it for this post, I am a big big fan of this web site would like to go along updated.
Awesome writing style!
I’ve created http://url-shortener.io Thanks, great article!