Time windows
In stream processing, time windows are time intervals based on which we can divide events and perform data computations.
RisingWave supports three types of time windows:
- Tumble windows
- Hop windows
- Session windows
For tumbling and hopping windows, RisingWave supports them by time window functions, tumble()
and hop()
respectively. For session window, RisingWave supports it by a special type of window function frame, i.e. SESSION
frame.
Time window function
Time window functions are used in the FROM clause. They take a table/source/materialized view, a time column and some other arguments as input, and assign each input row a time window by augmenting the row with two new columns: window_start
and window_end
. The two augmented columns represent the start and end of time windows respectively.
tumble()
time window function
Tumbling windows are contiguous non-overlapping time intervals.
The syntax of the tumble()
is as follows:
- table_or_source specifies the table/source/materialized view that needs to be assigned with time windows.
- time_col specifies the column to determine time windows on. It can be in either timestamp or timestamp with time zone format.
Example of timestamp with time zone format:
2022-01-01 10:00:00+00:00
. - window_size specifies the size of each time window. It should be a constant value of
INTERVAL
type. Example:INTERVAL '2 MINUTES'
. The standard SQL format, which places time units outside of quotation marks (for example,INTERVAL '2' MINUTE
), is also supported. - offset is an optional parameter that allows you to shift the starting point of each time window.
Example:
window_size = INTERVAL '10 MINUTES', offset = INTERVAL '2 MINUTES'
will yield time window starts like2022-01-01 00:12:00+00:00
.
Suppose that we have a table, taxi_trips
, that consists of these columns: trip_id
, taxi_id
, completed_at
, distance
, and duration
.
Here is an example that uses the tumbling time window function.
The result looks like this:
hop()
time window function
The hop()
time window function also assigns each row a time window with a fixed size, which is very similar to tumble()
, except that the assigned time windows may overlap.
See below for the syntax of the hop()
time window function.
- table_or_source specifies the table/source/materialized view that needs to be assigned with time windows.
- time_col specifies the column to determine time windows on. It can be in either timestamp or timestamp with time zone format.
Example of timestamp with time zone format:
2022-01-01 10:00:00+00:00
. - hop_size specifies the size of each hop, window_size specifies the size of each time window. Both should be constant values of
INTERVAL
type. For example:INTERVAL '2 MINUTES'
. The standard SQL format, which places time units outside of quotation marks (for example,INTERVAL '2' MINUTE
), is also supported. - offset is an optional parameter that allows you to shift the starting point of each time window.
Example:
window_size = INTERVAL '10 MINUTES', offset = INTERVAL '2 MINUTES'
will yield time window starts like2022-01-01 00:12:00+00:00
.
Here is an example.
The result looks like the table below. Note that the number of rows in the result of a hop window function is N times the number of rows in the original table, where N is the window size divided by the hop size.
Session windows
In RisingWave, session windows are supported by a special type of window function frame: SESSION
frame. You can refer to Window function calls for detailed syntax.
Currently, SESSION
frame is only supported in batch mode and emit-on-window-close streaming mode.
When using session windows, you can achieve the effect that is very similar to tumble()
and hop()
time window functions, that is, to assign each row a time window by augmenting it with window_start
and window_end
. Here is an example:
Given the following table data:
And the following query:
The result looks like this:
Window aggregations
Let’s see how we can perform time window aggregations.
Tumble window aggregations
Below is an example of tumble window aggregation. In this example, we want to get the number of trips and the total distance for each tumbling window (2 minutes).
The result looks like this:
Hop window aggregations
Below is an example of hopping window aggregation. In this example, we want to get the number of trips and the total distance within a two-minute window every minute.
The result looks like this:
Session window aggregations
Below is an example of aggregation over session windows. In this example, we want to get the number of unique products viewed by each user in session gapped by 5 minutes interval, based on the example data in previous Session windows section.
The result looks like this:
Window joins
You can join a time window with a table, or another time window that is of the same type and has the same time attributes.
Joins with tables
Let’s see how you can join a time window with a table.
Suppose that you have a simple table taxi_simple
that has the following data:
You can join it with a time window:
The result looks like this:
Window joins
You can join two tumble time windows to get both trip and fare information. The corresponding tables are taxi_trips
and taxi_fare
.
The taxi_fare
table has the following data:
You can join two time windows:
The result looks like this.