Maintain wide table with table sinks
This guide introduces how to maintain a wide table whose columns come from different sources. Traditional data warehouses or ETL use a join query for this purpose. However, streaming join brings issues such as low efficiency and high memory consumption.
In some cases with limitation, use the CREATE SINK INTO TABLE and ON CONFLICT clause can save the resources and achieve high efficiency.
Merge multiple sinks with the same primary key
Keep in mind that the ON CONFLICT
clause does not affect the update or delete events, the sinks should be forced to be append-only. Otherwise, the delete or update events from any sink will delete the regarding row.
Enrich data with foreign keys in Star/Snowflake schema model
With star schema, the data is constructed with a central fact table surrounded by several related dimension tables. Each dimension table is joined to the fact table through a foreign key relationship. Given that the join key is the primary key of the dimension tables, we can rewrite the query as a series of sink into table.
The example above and the following SQL with left join operation are completely equivalent.
But maintaining wide table with table sinks can save the resources and achieve high efficiency.
Furthermore, for the large dimension table, we can use Temporal Join as the partial join to reduce the streaming state and improve performance.