Object storages
Ingest data from Google Cloud Storage
Use the SQL statement below to connect RisingWave to a Google Cloud Storage source.
Syntax
schema_definition:
Connector parameters
Field | Notes |
---|---|
gcs.bucket_name | Required. The name of the bucket the data source is stored in. |
gcs.credential | Required. Base64-encoded credential key obtained from the GCS service account key JSON file. To get this JSON file, refer to the guides of GCS documentation.
|
gcs.service_account | Optional. The service account of the target GCS source. If gcs.credential or ADC is not specified, the credentials will be derived from the service account. |
match_pattern | Conditional. This field is used to find object keys in the bucket that match the given pattern. Standard Unix-style glob syntax is supported. |
compression_format | Optional. This field specifies the compression format of the file being read. You can define compression_format in the CREATE TABLE statement. When set to gzip or gz, the file reader reads all files with the .gz suffix. When set to None or not defined, the file reader will automatically read and decompress .gz and .gzip files. |
refresh.interval.sec | Optional. Configure the time interval between operations of listing files. It determines the delay in discovering new files, with a default value of 60 seconds. |
Other parameters
Field | Notes |
---|---|
data_format | Supported data format: PLAIN. |
data_encode | Supported data encodes: CSV, JSON, PARQUET. |
without_header | This field is only for CSV encode, and it indicates whether the first line is header. Accepted values: ‘true’, ‘false’. Default: ‘true’. |
delimiter | How RisingWave splits contents. For JSON encode, the delimiter is \n ; for CSV encode, the delimiter can be one of , , ; , E'\t' . |
Additional columns
Field | Notes |
---|---|
file | Optional. The column contains the file name where current record comes from. |
offset | Optional. The column contains the corresponding bytes offset (record offset for parquet files) where current message begins. |
Loading order of GCS files
The GCS connector does not guarantee the sequential reading of files.
For example, RisingWave reads file F1 to offset O1 and crashes. After RisingWave rebuilds the task queue, it is not guaranteed the next task is reading file F1.
Examples
Here are examples of connecting RisingWave to an GCS source to read data from individual streams.