My wife manages the family’s short-term apartment rental business in Budapest and maintains a detailed Google Sheet for all bookings. This weekend, she asked me if I could help calculate a monthly utilization and a profit share report for this year (2024). Calculating these metrics can be cumbersome with traditional data analytics software even for moderately sized datasets. This is especially true when bookings span multiple months and pricing varies based on factors like weekends. The request made me contemplate the beauty and expressiveness of q programming language and inspired me to write this article.

The data model

Each row corresponds to a booking. Among other fields (like channel, tax, gross income), she stores the start date, the number of nights and the net profit of the overall stay for each booking.

The problem

For each month, calculate

  • the utilization ratio, which equals the number of taken days divided by the number of days of the month. For example, if 20 days were booked in April, then April’s utilization rate is 21/30 = 70%.
  • the profit distribution, i.e. what percentage of the total profit came from each month. For example, if the profit in 2024 is 10000 USD and January contributed 800 USD to this amount then she would like to see an 8% profit rate for January. For simplicity, we assume that the profit is spread evenly across the booked days. In reality, the weekends are e.g. 20% more expensive than weekdays. We will see how easily kdb+ can cope with this extra complexity.

A booking may span over multiple months. For example, checking in on the 29 of March for 7 days means 3 days in March and 4 days in April.

First attempt with Google Sheets

Intuitively, we need to work with lists of dates. Google Sheets can generate a sequence of numbers by function SEQUENCE but adding (up to 31) new columns would pollute the sheet. I don’t see an elegant way of solving the problem with Google Sheets although I am open to conviction.

kdb+ and vector programming

Let us create a mock table for demonstration purposes.

q)t:flip `startdate`daycount`profit!flip ((2024.01.08;4;110);(2024.02.12;2;60);(2024.02.20;12;380);(2024.03.10;15;400);(2024.05.15;50;1900))
q)t
startdate  daycount profit
--------------------------
2024.01.08 4        110   
2024.02.12 2        60    
2024.02.20 12       380   
2024.03.10 15       400   
2024.05.15 50       1900  

We can get the list of booked days by generating a sequence of numbers starting from zero (function til) that we add to the start date. q is a vector programming language so adding a scalar to a list works as expected:

q)2024.01.08 + til 4
2024.01.08 2024.01.09 2024.01.10 2024.01.11

Unlike ANSI SQL, kdb+ supports storing lists in table cells. Let us extend the original table using the each adverb for explicit iteration over the rows

q)update dates: startdate + til each daycount from t
startdate  daycount profit dates                                                                                                                           ..
-----------------------------------------------------------------------------------------------------------------------------------------------------------..
2024.01.08 4        110    2024.01.08 2024.01.09 2024.01.10 2024.01.11                                                                                     ..
2024.02.12 2        60     2024.02.12 2024.02.13                                                                                                           ..
2024.02.20 12       380    2024.02.20 2024.02.21 2024.02.22 2024.02.23 2024.02.24 2024.02.25 2024.02.26 2024.02.27 2024.02.28 2024.02.29 2024.03.01 2024.03..
2024.03.10 15       400    2024.03.10 2024.03.11 2024.03.12 2024.03.13 2024.03.14 2024.03.15 2024.03.16 2024.03.17 2024.03.18 2024.03.19 2024.03.20 2024.03..
2024.05.15 50       1900   2024.05.15 2024.05.16 2024.05.17 2024.05.18 2024.05.19 2024.05.20 2024.05.21 2024.05.22 2024.05.23 2024.05.24 2024.05.25 2024.05..```

Monthly utilization ratio

We only care about the months, so we convert dates column to months ($) and create a new table that only contains the relevant data:

q)t1: select month: raze `mm$startdate + til each daycount from t
q)t1
month
-----
1    
1    
1    
1    
2    
2    
2    
2    
2    
2    
2    
2    
2    
..

a simple group-by select statement returns the number of takes days by month

q)select daycount:count i by month from t1
month| daycount
-----| --------
1    | 4       
2    | 12      
3    | 17      
5    | 17      
6    | 30      
7    | 3  

To get the ratios, we need to store the number of days of all months. A dictionary does the job

q)monthlength:(1 + til 12)!31 29 31 30 31 30 31 31 30 31 30 31
q)monthlength
1 | 31
2 | 29
3 | 31
4 | 30
5 | 31
6 | 30
7 | 31
8 | 31
9 | 30
10| 31
11| 30
12| 31

We can avoid hardcoding the data and handle non-leap years by generating all days and grouping by month.

q)count each group `mm$2024.01.01 + til 2025.01.01-2024.01.01
1 | 31
2 | 29
3 | 31
4 | 30
5 | 31
6 | 30
7 | 31
8 | 31
9 | 30
10| 31
11| 30
12| 31

q expressions can smoothly be injected into q-sql statements.

q)select utilization_ratio: (count month)%first monthlength month by month from t1
month| utilization_ratio
-----| -----------------
1    | 0.1290323        
2    | 0.4137931        
3    | 0.5483871        
5    | 0.5483871        
6    | 1                
7    | 0.09677419

We made use of the vectorian nature of q again. A dictionary is indexed by a list then a list of values is returned.

Profit distribution

If the profit is shared evenly across the booked days, then we can create a new table that stores the daily profit and the list of months for each booking.

q)select daily_profit: profit%daycount, month: `mm$startdate+til each daycount from t
daily_profit month                                                                                               
-----------------------------------------------------------------------------------------------------------------
27.5         1 1 1 1i                                                                                            
30           2 2i                                                                                                
31.66667     2 2 2 2 2 2 2 2 2 2 3 3i                                                                            
26.66667     3 3 3 3 3 3 3 3 3 3 3 3 3 3 3i                                                                      
38           5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7i

The daily_profit column contains scalars, and the remaining (single) columns contain lists. Function ungroup performs the inverse operation of group and creates multiple rows for each group

q)t2:ungroup select daily_profit: profit%daycount, month: `mm$startdate+til each daycount from t
q)t2
daily_profit month
------------------
27.5         1    
27.5         1    
27.5         1    
27.5         1    
30           2    
30           2    
31.66667     2    
31.66667     2    
31.66667     2    
31.66667     2    
31.66667     2    
31.66667     2    
31.66667     2      
..

The final steps require a basic aggregation and nested select statements.

q)select month, profit_rate: daily_profit%sum daily_profit from select sum daily_profit by month from t2
month profit_rate
-----------------
1     0.03859649 
2     0.1321637  
3     0.1625731  
5     0.2266667  
6     0.4        
7     0.04 

Taking weekends into account

Weekends are 20% more expensive than weekdays. How do we decide if a date is a weekday in q? All dates are represented by an integer, 0 representing 2001.01.01. Modulo calculation works seamlessly with dates, so we can write a function that returns true if a given date is Saturday or Sunday.

q)isWeekend: {(x mod 7) in 0 1}
q)isWeekend 2024.11.17
1b
q)isWeekend 2024.11.18
0b

How fantastic the vectorial nature of q is? If your custom function is made of operators that support implicit iteration then so does your custom function. You don’t need ot write a single line of code. It does not matter if you pass a scalar, a list, a dictionary, etc the function’s output conforms to the input.

q)isWeekend 2024.11.17 + til 3
100b

We can convert boolean values to weights 1.2 and 1 using an (vectorial) if-then-else statement (denoted by ?) or a dictionary or making use of implicit boolean casts in arithmetic operations.

q)?[isWeekend 2024.11.17 + til 3;1.2;1]
1.2 1 1
q)(10b!1.2 1) isWeekend 2024.11.17 + til 3
1.2 1 1
q)1 + 0.2 * isWeekend 2024.11.17 + til 3
1.2 1 1

Let us wrap getting the relative weights into a function

q)getWeights: {w: 1+0.2 * isWeekend x; w%sum w}

We can update the query that generates internal table t2 by calling getWeights for each booking.

q)t2:ungroup select daily_profit: profit * getWeights each dates, month: `mm$dates from select profit, dates: startdate+til each daycount from t
q)t2
daily_profit month
------------------
27.5         1    
27.5         1    
27.5         1    
27.5         1    
30           2    
30           2    
30.15873     2    
30.15873     2    
30.15873     2    
30.15873     2    
36.19048     2    
36.19048     2    
30.15873     2    
..

All in one

We can put the two metrics into one table either by employing a left-join or preferably calculating the two metrics in one go

q)(select month, profit_rate: daily_profit%sum daily_profit from select sum daily_profit by month from t2) lj select utilization_ratio: (count month)%first monthlength month by month from t1
month profit_rate utilization_ratio
-----------------------------------
1     0.03859649  0.1290323        
2     0.1311055   0.4137931        
3     0.1636313   0.5483871        
5     0.2247475   0.5483871        
6     0.4040404   1                
7     0.03787879  0.09677419    

q)update profit_rate: profit%sum profit from select profit: sum daily_profit, utilization_ratio: (count month)%first monthlength month by month from t2
month| profit   utilization_ratio profit_rate
-----| --------------------------------------
1    | 110      0.1290323         0.03859649 
2    | 373.6508 0.4137931         0.1311055  
3    | 466.3492 0.5483871         0.1636313  
5    | 640.5303 0.5483871         0.2247475  
6    | 1151.515 1                 0.4040404  
7    | 107.9545 0.09677419        0.03787879

Conclusion

kdb+ is an outstanding, general-purpose tool for data analysis. The vector programming, temporal arithmetics, q-sql and the built-in function all play well together to create a robust, easy-to-maintain, easy-to-read analytical solution.

If you liked this article, then set a trip to Budapest, book my wife’s apartment and enjoy reading Q for Mortals in a tranquil environment 😎