Design - Booking System

We need to design a booking system like booking.com/airbnb. First we look at the requirements:

FUNCTIONAL REQUIREMENTS:

Hotel-

Onboarding
Updates
See the booking details

User-

Search hotels
Book rooms
Check bookings

Analytics
Dynamic pricing

NON-FUNCTIONAL REQUIREMENTS:

Moderate latency - It's ideal to have low latency when user makes a reservation, but it's acceptable if the system takes a few seconds to book the same room,
High Availability
High Consistency
High Scalability

API Signatures:

We'll have basic CRUD service (Create, Read, Update, Delete)

Hotel-related APIs:

GET /v1/hotels/ID
POST /v1/hotels
PUT /v1/hotels/ID
DELETE /v1/hotels/ID

Room-related APIs:

GET /v1/hotels/ID/rooms/ID
POST /v1/hotels/ID/rooms
PUT /v1/hotels/ID/rooms/ID
DELETE /v1/hotels/ID /rooms/ID

Reservation-related APIs:

GET /v1/reservations (get reservation history)
GET /v1/reservations/ID (detailed information about a reservation)
POST /v1/reservations/ID (add new reservation)
DELETE /v1/reservations/ID (cancel reservation)

Choice of Database:

RDBMS provides us with ACID properties which are important for a reservation system. (We'll see later on). There is also a clear relation in the data which a RDBMS can easily structure.

For search feature we'll use ElasticSearch/Solr and for checking the reservation history we'll use Cassandra to reduce the load on RDBMS and keeping the RDBMS only for active bookings.

DESIGN:

DESIGN OVERVIEW:

Hotel UI: This is the UI for hotel manager, if they want add any hotel or room they will use this UI.

Hotel Service: It handles the onboarding and management of hotels. Hotel Data is a relational data so we'll store this data in the MySql Cluster.

All the updates and addition in the hotel database will be pushed to Kafka Cluster and there will be multiple consumers sitting on the Kafka Cluster which will then process the information.

CDN: All the hotel and rooms images will be stored in the CDN and the reference to the images will be stored in the Hotel MySql Cluster.

Search Consumer: This Search consumer pulls the payload from Kafka and then renders the data and puts it into ElasticSearch. The payload can be addition of a room in hotel or removing a room from the hotel as it is already booked.

Elastic Search: The idea to use elastic search is that it supports fuzzy search. Even if user types an incorrect spelling we would still be able to show them the hotels, ensuring good user experience.

Booking UI: It is the main UI for user though which it can log in, search, book and view past reservations.

Search Service: This service powers the search functionality on our application. The user talks to this service from Booking UI and it displays the list of available hotels and rooms.

Booking Service: Booking request comes to through Booking UI and this service talks to the payment service and Booking MySql Cluster to handle the booking. It also puts the booking details to Kafka.

We have used 2 different MySql Cluster as it'll give us the ability to scale them independently according to our use case.

Archival Service: We are storing only the live data in MySql i.e. booking that are done but payment is not yet completed. Once a booking moves to a terminal state i.e. Cancelled or Done, it removes the data from MySql and stores it into Cassandra (More writes low reads).

Redis: It sits on to of MySql cluster and acts a cache to reduce load on the MySql. It will be write-through cache so whenever a new booking takes place it'll update the cache.

Booking Management Service: This service calls talks to redis for active booking and cassandra for historical bookings.

Spark Streaming & Hadoop Cluster: We'll use it for analytical purposes, it'll read all the events that are pushed to Kafka. Spark will read all the events and then push the data to hadoop cluster for reporting.

DESIGN DEEP-DIVE:

Primary keys & foreign keys are represented by blue colour.

Hotel DB-

hotel - [hotel_id, name, desc, address, location, image_url]
rooms - [room_id, hotel_id, is_active, quantity, price_min, price_max]

We have two columns price_min and price_max which will be set by the hotel manager. The spark job will run the analytical jobs on the bookings and set the price of the hotel b/w these ranges according to the demand (to initialise a price we can set it to a average of max and min price)

We have not added a redis cluster on this MySql Database because it is not a critical service for user. So, to reduce the cost we have only added the redis on Booking MySql cluster.

Booking DB-

available_rooms - [room_id, date, total_qty, total_reserved]
booking - [booking_id, hotel_id, room_id, start_date, end_date, qty, status]
status can be [reserved, booked, cancelled, completed]. Initially user will be in reserved status then after the payment confirmation it can move to cancelled or booked status. When the user exhausts the cancellation period it changes the status to confirmed.

How to ensure that there will be no collisions while booking a same room by multiple users?

We'll query the available_rooms table with condition total_qty - total_reserved >= qty_from_user within the date range provided by the user. To resolve conflicts we have 3 options-

Pessimistic Locking: Pessimistic locking is a strategy where a database actively locks records as soon as a user accesses them to prevent other users from making conflicting updates.
When a user retrieves a record with a pessimistic lock, the database prevents other users from updating or accessing the locked data until the first user releases it. This often involves either a shared lock (read lock) or an exclusive lock (write lock).

Pros: Good for High-contention scenarios where multiple users or processes are likely to modify the same records.
Cons: It can lead to deadlocks.

Optimistic Locking: Optimistic locking is a strategy that allows multiple users to access data simultaneously but checks for conflicts only at the time of committing a transaction.
Optimistic locking uses a versioning mechanism (often a version number or timestamp) that’s updated each time a record is modified. When a user tries to update a record, the system checks if the version matches the latest version in the database. If it does, the update proceeds; if not, the update fails, indicating that another user has modified the data since it was read.

Pros: No need to lock the database resource and good for Low-contention environments where data conflicts are rare.
Cons: Poor performance for high data contention

Database Constraints: Database constraints are rules enforced by the database to maintain data integrity, ensuring that the data remains accurate, valid, and consistent.

Pros: Easy to implement and goos for low-contention environment.
Cons: Not all database support constraints.

So we'll proceed with Optimistic locking as it suits best for our use-case as hotel booking conflicts belongs to low-contention environment.

How to update the booking status from reserved to booked/cancelled?

To understand this we'll got through the whole flow of when user books a room.

Check available rooms: available_rooms table will be queried and if the rooms are available we lock the row through optimistic locking
Insert the booking and reduce available rooms: We'll update the row with added quantity and increment the row version.
Put in Redis with TTL: We cannot keep the room reserved for an infinite amount of time. So we implement TTL(Time to live) of Redis. We put the booking_id with expiry_time (say like 5 mins after the booking is done). If we get success confirmation from payment service then we update the status to booked otherwise we'll get a callback from Redis after the expiry time or we can get a failure confirmation from payment service, then we'll update the status as cancelled. After we change the status to cancelled we move to available_room table and update the total_reserved column.

Now what if we get the payment notification after the Redis callback?

There can be two response from payment service -

Payment Failure: In this case, we don't do anything as we have already set the status to cancelled.
Payment Success: We can proceed with two approaches in this case.

We can revert the payment to user and send them a notification that we were not to proceed with the booking due to some error.
We can check the available_rooms table and if we have the rooms available we can proceed once again with the booking.

OPTIMIZATIONS:

TTL is not a precise measure it has some delay in callback from the expiry time. This is because how Redis has implemented TTL (which we'll talk in some other article). So we can implement a priority_queue inside Redis and have a poller that pulls the top elements of queue every 1 second but this approach comes with some cost as there will be high CPU utilisation on cron side as well as Redis side.
We can remove the key from Redis after we get the payment confirmation before it expires in Redis. It'll free up memory.

Hemant's Blog