Skip to main content

Google Cloud Storage

Google Cloud Long-Term Storage

Synopsis

Creates a target that writes log messages to Google Cloud Storage buckets with support for various file formats, authentication methods, and multipart uploads. The target handles large file uploads efficiently with configurable rotation based on size or event count.

Schema

- name: <string>
description: <string>
type: gcs
pipelines: <pipeline[]>
status: <boolean>
properties:
key: <string>
secret: <string>
project_id: <string>
region: <string>
endpoint: <string>
part_size: <numeric>
bucket: <string>
buckets:
- bucket: <string>
name: <string>
format: <string>
compression: <string>
extension: <string>
schema: <string>
name: <string>
format: <string>
compression: <string>
extension: <string>
schema: <string>
max_size: <numeric>
batch_size: <numeric>
timeout: <numeric>
field_format: <string>
interval: <string|numeric>
cron: <string>
debug:
status: <boolean>
dont_send_logs: <boolean>

Configuration

The following fields are used to define the target:

FieldRequiredDefaultDescription
nameYTarget name
descriptionN-Optional description
typeYMust be gcs
pipelinesN-Optional post-processor pipelines
statusNtrueEnable/disable the target

Google Cloud Storage Credentials

FieldRequiredDefaultDescription
keyN*-Google Cloud Storage HMAC access key ID for authentication
secretN*-Google Cloud Storage HMAC secret access key for authentication
project_idY-Google Cloud project ID
regionNus-central1Google Cloud region (e.g., us-central1, europe-west1, asia-east1)
endpointNhttps://storage.googleapis.comCustom GCS-compatible endpoint URL

* = Conditionally required. HMAC credentials (key and secret) are required unless using service account authentication with Application Default Credentials.

Connection

FieldRequiredDefaultDescription
part_sizeN5Multipart upload part size in megabytes (minimum 5MB)
timeoutN30Connection timeout in seconds
field_formatN-Data normalization format. See applicable Normalization section

Files

FieldRequiredDefaultDescription
bucketN*-Default GCS bucket name (acts as catch-all when buckets is also specified)
bucketsN*-Array of bucket configurations for file distribution
buckets.bucketY-GCS bucket name
buckets.nameY-File name template
buckets.formatN"json"Output format: json, multijson, avro, parquet
buckets.compressionN-Compression algorithm. See Compression below
buckets.extensionNMatches formatFile extension override
buckets.schemaN**-Schema definition file path (required for Avro and Parquet formats)
nameN"vmetric.{{.Timestamp}}.{{.Extension}}"Default file name template (used with bucket for catch-all)
formatN"json"Default output format (used with bucket for catch-all)
compressionN-Default compression (used with bucket for catch-all)
extensionNMatches formatDefault file extension (used with bucket for catch-all)
schemaN-Default schema path (used with bucket for catch-all)
max_sizeN0Maximum file size in bytes before rotation
batch_sizeN100000Maximum number of messages per file

* = Either bucket or buckets must be specified.

** = Conditionally required for Avro and Parquet formats when using buckets.

note

When max_size is reached, the current file is uploaded to GCS and a new file is created. For unlimited file size, set the field to 0.

Scheduler

FieldRequiredDefaultDescription
intervalNrealtimeExecution frequency. See Interval for details
cronN-Cron expression for scheduled execution. See Cron for details

Debug Options

FieldRequiredDefaultDescription
debug.statusNfalseEnable debug logging
debug.dont_send_logsNfalseProcess logs but don't send to target (testing)

Details

The Google Cloud Storage target provides enterprise-grade cloud storage integration with comprehensive file format support. GCS offers high durability (99.999999999%), strong consistency for read-after-write operations, and integration with Google Cloud's security and analytics ecosystem.

Authentication Methods

Supports HMAC credentials (access key and secret key) for S3-compatible API access. When deployed on Google Cloud infrastructure, can leverage service account authentication with Application Default Credentials without explicit credentials. HMAC keys can be created through the Google Cloud Console for programmatic access.

IAM Permissions

The service account requires the following IAM role:

IAM RoleRole IDPurpose
Storage Object Creatorroles/storage.objectCreatorUpload (create) objects in GCS buckets

Minimum permissions: storage.objects.create

Storage Classes

Google Cloud Storage supports multiple storage classes for cost optimization:

Storage ClassUse Case
StandardFrequently accessed data
NearlineData accessed less than once per month
ColdlineData accessed less than once per quarter
ArchiveData accessed less than once per year

Available Regions

Google Cloud Storage is available in multiple regions worldwide:

Region CodeLocation
us-central1Iowa, USA
us-east1South Carolina, USA
us-west1Oregon, USA
europe-west1Belgium
europe-west2London, UK
europe-west3Frankfurt, Germany
asia-east1Taiwan
asia-northeast1Tokyo, Japan
asia-southeast1Singapore
australia-southeast1Sydney, Australia
Loading include...
Loading include...

Templates

The following template variables can be used in file names:

VariableDescriptionExample
{{.Year}}Current year2024
{{.Month}}Current month01
{{.Day}}Current day15
{{.Timestamp}}Current timestamp in nanoseconds1703688533123456789
{{.Format}}File formatjson
{{.Extension}}File extensionjson
{{.Compression}}Compression typezstd
{{.TargetName}}Target namemy_logs
{{.TargetType}}Target typegcs
{{.Table}}Bucket namelogs

Multipart Upload

Large files automatically use multipart upload protocol with configurable part size (part_size parameter). Default 5MB part size balances upload efficiency and memory usage.

Multiple Buckets

Single target can write to multiple GCS buckets with different configurations, enabling data distribution strategies (e.g., raw data to one bucket, processed data to another).

Schema Requirements

Avro and Parquet formats require schema definition files. Schema files must be accessible at the path specified in the schema parameter during target initialization.

Integration with Google Cloud

GCS integrates seamlessly with other Google Cloud services including BigQuery for analytics, Cloud Functions for serverless processing, and Cloud Logging for centralized logging.

Examples

Basic Configuration

The minimum configuration for a JSON GCS target:

targets:
- name: basic_gcs
type: gcs
properties:
key: "GOOG1EXAMPLE1234567890ABCDEFGHIJ"
secret: "abcdefghijklmnopqrstuvwxyz1234567890ABCD"
project_id: "my-project-123456"
bucket: "datastream-logs"

Service Account Authentication

Configuration using Application Default Credentials:

targets:
- name: gcs_service_account
type: gcs
properties:
project_id: "my-project-123456"
region: "us-central1"
bucket: "datastream-logs"

Pipeline-Based Routing

Dynamic bucket routing using pipeline processors to analyze log content and route to appropriate buckets:

targets:
- name: smart_routing_gcs
type: gcs
pipelines:
- dynamic_routing
properties:
key: "GOOG1EXAMPLE1234567890ABCDEFGHIJ"
secret: "abcdefghijklmnopqrstuvwxyz1234567890ABCD"
project_id: "my-project-123456"
region: "us-central1"
buckets:
- bucket: "security-events"
name: "security-{{.Year}}-{{.Month}}-{{.Day}}.json"
format: "json"
- bucket: "application-events"
name: "app-{{.Year}}-{{.Month}}-{{.Day}}.json"
format: "json"
- bucket: "system-events"
name: "system-{{.Year}}-{{.Month}}-{{.Day}}.json"
format: "json"
bucket: "other-events"
name: "other-{{.Timestamp}}.json"
format: "json"

pipelines:
- name: dynamic_routing
processors:
- set:
field: "_vmetric.bucket"
value: "security-events"
if: "ctx.event_type == 'security'"
- set:
field: "_vmetric.bucket"
value: "application-events"
if: "ctx.event_type == 'application'"
- set:
field: "_vmetric.bucket"
value: "system-events"
if: "ctx.event_type == 'system'"

Multiple Buckets with Catch-All

Configuration for routing different log types to specific buckets with a catch-all for unmatched logs:

targets:
- name: multi_bucket_routing
type: gcs
properties:
key: "GOOG1EXAMPLE1234567890ABCDEFGHIJ"
secret: "abcdefghijklmnopqrstuvwxyz1234567890ABCD"
project_id: "my-project-123456"
region: "us-central1"
buckets:
- bucket: "security-logs"
name: "security-{{.Year}}-{{.Month}}-{{.Day}}.json"
format: "json"
- bucket: "application-logs"
name: "app-{{.Year}}-{{.Month}}-{{.Day}}.json"
format: "json"
bucket: "general-logs"
name: "general-{{.Timestamp}}.json"
format: "json"

Multiple Buckets with Different Formats

Configuration for distributing data across multiple GCS buckets with different formats:

targets:
- name: multi_bucket_export
type: gcs
properties:
key: "GOOG1EXAMPLE1234567890ABCDEFGHIJ"
secret: "abcdefghijklmnopqrstuvwxyz1234567890ABCD"
project_id: "my-project-123456"
region: "europe-west1"
buckets:
- bucket: "raw-data-archive"
name: "raw-{{.Year}}-{{.Month}}-{{.Day}}.json"
format: "multijson"
compression: "gzip"
- bucket: "analytics-data"
name: "analytics-{{.Year}}/{{.Month}}/{{.Day}}/data_{{.Timestamp}}.parquet"
format: "parquet"
schema: "<schema definition>"
compression: "snappy"

Parquet Format

Configuration for daily partitioned Parquet files:

targets:
- name: parquet_analytics
type: gcs
properties:
key: "GOOG1EXAMPLE1234567890ABCDEFGHIJ"
secret: "abcdefghijklmnopqrstuvwxyz1234567890ABCD"
project_id: "my-project-123456"
region: "us-west1"
bucket: "analytics-lake"
name: "events/year={{.Year}}/month={{.Month}}/day={{.Day}}/part-{{.Timestamp}}.parquet"
format: "parquet"
schema: "<schema definition>"
compression: "snappy"
max_size: 536870912

High Reliability

Configuration with enhanced settings:

targets:
- name: reliable_gcs
type: gcs
pipelines:
- checkpoint
properties:
key: "GOOG1EXAMPLE1234567890ABCDEFGHIJ"
secret: "abcdefghijklmnopqrstuvwxyz1234567890ABCD"
project_id: "my-project-123456"
region: "us-east1"
bucket: "critical-logs"
name: "logs-{{.Timestamp}}.json"
format: "json"
timeout: 60
part_size: 10

With Field Normalization

Using field normalization for standard format:

targets:
- name: normalized_gcs
type: gcs
properties:
key: "GOOG1EXAMPLE1234567890ABCDEFGHIJ"
secret: "abcdefghijklmnopqrstuvwxyz1234567890ABCD"
project_id: "my-project-123456"
region: "europe-west2"
bucket: "normalized-logs"
name: "logs-{{.Timestamp}}.json"
format: "json"
field_format: "cim"

BigQuery Integration

Configuration optimized for BigQuery data lake:

targets:
- name: bigquery_ready
type: gcs
properties:
key: "GOOG1EXAMPLE1234567890ABCDEFGHIJ"
secret: "abcdefghijklmnopqrstuvwxyz1234567890ABCD"
project_id: "my-project-123456"
region: "us-central1"
bucket: "bigquery-staging"
name: "bq-import/{{.Year}}/{{.Month}}/{{.Day}}/data-{{.Timestamp}}.json"
format: "json"
compression: "gzip"
max_size: 1073741824

Debug Configuration

Configuration with debugging enabled:

targets:
- name: debug_gcs
type: gcs
properties:
key: "GOOG1EXAMPLE1234567890ABCDEFGHIJ"
secret: "abcdefghijklmnopqrstuvwxyz1234567890ABCD"
project_id: "my-project-123456"
region: "asia-east1"
bucket: "test-logs"
name: "test-{{.Timestamp}}.json"
format: "json"
debug:
status: true
dont_send_logs: true