Design - URL Shortener

 FUNCTIONAL REQUIREMENTS - 

  • Get short url
  • Redirect to long url

NON-FUNCTIONAL REQUIREMENTS - 

  • Low latency
  • High Availability

API ENDPOINTS -

  • POST v1/data/shorten
  • GET v1/{shortUrl}

How to shorten the URL?

Let us consider the shortened URL consists of character (a-z), (A-Z) and (0-9).
So, the number of characters present are 26+26+10 = 62.

Let's say our URL shortener can generate 11600 urls/second. So, it can generate 1160*3600*24 = 100 million url's per day. Let's say we want to store the URL's for 10 years. Therefore, 100 million * 365 * 10 = 365 billion records.

So, from this estimation we can infer the max length of the shortened URL - 
62^⒩ >= 365 billion. So n comes out to be 7.
Therefore, a url of length 7 is more than enough to store 365 billion records.

DESIGN - 

The main problem we face here is, how to generate ID's. One way is to use auto increment feature of RDBMS but that puts to much load on the system if the traffic is high. We also cannot generate ID's on server side because there can be multiple instances of a service and there is high chance of collision.

One solution is to have ticket server to generate ID's but we won't discuss that as it leads to a single point of failure. Another approach is to go with Twitter's snowflake unique Id generation algorithm that will generate numeric unique Id's (handling our use case) but we'll talk about this in some other article. 

A more go-to solution would be to use Redis for it's functionality of providing a unique no. in the range provided to it (let's say from 1-1 million), but now we face a problem that all these services will call this Redis and it will be under huge amount of load also it is a single point of failure.
One can argue to keep multiple Redis, but we may face collisions in it or even if we provide different ranges to each Redis, we'll face huge complication if we want to add a new Redis.




Zookeeper - Apache ZooKeeper is a distributed coordination service designed to manage and coordinate large clusters of distributed applications. ZooKeeper serves as a centralised service for maintaining configuration information, naming, synchronisation, and group services within a distributed system.

It will provide ranges to different servers (like 1-500, 501-1000..) it'll also reassign the range to a server once it reaches it's limit.

Cassandra - We are using a NoSql here because of high availability and easy scalability.

If a server goes down, what will happen to the Id's that were not assigned?

It's okay even if we loose couple of thousand tokens in the range, because Base62 of length 7 has enough numbers to handle our use case.

Comments

Post a Comment

Popular posts from this blog

Puzzle - 100 Doors

Design - Booking System

Design - Notification System