Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
690 views
in Technique[技术] by (71.8m points)

etl - Data aggregation during data load to snowflake using snowpipe

I am evaluating snowflake for reporting usecase. I am considering snowpipe for ETL. Data is ingested from S3. Data in S3 contains information about user sessions captured at regular interval. In Snowflake, I want to stored these data aggregate. As per documentation snowflake supports only basic transformation and doesn't support group by and join during copying data from S3 staging to tables in snowflake.

I am new to ETL and snowflake. One way i was thinking is to load raw detailed data from staging to temporary table in snowflake. Then run aggregations (group by and join) on temporary table to load data into final fact tables. Is this the correct approach for implementing complex tranformations?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Temporary tables in Snowflake only stick around for the session that they have been created in. This means that you won't be able to point a Snowpipe to it.

Instead of a temporary table, point Snowflake to a transient table to store the raw data and then truncate the table after some period of time. This will reduce costs. Personally, I'd keep the data in the transient table for as long as possible provided that it is not too cost prohibitive. This is to account for potentially late data etc.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...