Let's get started with the easy stuff
Let's find a good URL. I want something with my name in it, although chadkrause.com is too long.
chadk.com is taken. So is chad.com. chadk.co is available, however that's $15 and $32 to renew. I don't know if I want to spend that much. chadk.org and chadk.net are available, but I don't like .org's for things that are not organizatoins, and .net is lame. Everything with 'chad' is taken already, so that's out.
Well, I got impatient and just spent the $15 for chadk.co. Now there's no going back unless I want to waste money.
Let's think some things out
URLs need to be shortened to be useful. chadk.co is 7 characters long, 8 including the dot. The URL itself should be easy to remember. One of my complaints with other URL shorteners is that they require case matching. The tradeoff is length and number of URLs you can use vs ease of remembering. For this project, I think 4-5 characters would be sufficient. Using only letters and numbers (lowercase letters) you could get.
a-z + 0-9 = 26 + 10 = 36 36^3 = 46656 36^4 = 1679616 36^5 = 60466176
However, we might not want to use all letters and numbers. For example, 0 and o are easy to confuse, l and 1 are easy to confuse, etc. Also, swear words could also be produced. I won't provide a swear word filter, because the probability is low, and I just don't care enough. And it'll be funny when the day someone stumbles upon this and creates a shortened URL with a swear word in it.
So, with that in mind, we have:
a-z + 0-9 - 'o' - 'l' = 34 34^3 = 39304 34^4 = 1336336 34^5 = 45435424
To be honest, I'll realistically never get even 39,304 URLs shortened. But I'm going to go with the 4-character option anyways. This will make the overall URL 13 characters long, really only needing to remember 4 characters. Example: chadk.co/ab15
Afterthought: I could use Emojis in the URL, however, no.
Creating the Random URL
This seems trivial, just create a random string of characters. And you would be right, until you start filling up the short codes. Lets say I start with 3 characters. If I fill 75% of the possibilities, there is a good chance (probably 75% but I don't know the actual statistics) that the next random short code will already be used. I could check, however, there's penalties for doing so, and eventually when all but 1 short codes are available, there will be quite a lot of checks and quite a large amount of time to look for that last one.
One solution is to create all the possible random short codes, shuffle them, and insert them into the database. That's not a terrible idea, however, That'll make the database large right off the bat, which I don't really like. I don't know exactly how the performance would work out, especially if it's properly indexed, but my gut feeling doesn't like it. However, it would be O(1) because I could just get the next index that is empty, which will be easy since they'll be in order by rowId in the database. I'll probably go with this method.
How to redirect properly
Redirecting in PHP is easy:
However, there are different redirect types: 301, 307, 308.
Let's not reinvent the wheel. I'm going to look at bit.ly and goo.gl and see what they use.
Okay, just found out goo.gl is out. Even more reason to create my own.
Bit.ly uses a '301 Moved Permanently' redirection:
That is what I'll use.
Other considerations before I start
- Do I cache results?
- Do I start another thread to add to the database?
- Do I go with the route that adds 1.4 million rows to the database?
- Do I waste computation time finding a unique random short code?
- Do I add a reCaptcha?
- Do I limit how many an IP address can take?
- Do I censor some links?