Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
663 views
in Technique[技术] by (71.8m points)

hadoop - Migrate hive table to Google BigQuery

I am trying to design a sort of data pipeline to migrate my Hive tables into BigQuery. Hive is running on an Hadoop on premise cluster. This is my current design, actually, it is very easy, it is just a shell script:

for each table source_hive_table {

  • INSERT overwrite table target_avro_hive_table SELECT * FROM source_hive_table;
  • Move the resulting avro files into google cloud storage using distcp
  • Create first BQ table: bq load --source_format=AVRO your_dataset.something something.avro
  • Handle any casting issue from BigQuery itself, so selecting from the table just written and handling manually any casting

}

Do you think it makes sense? Is there any better way, perhaps using Spark? I am not happy about the way I am handling the casting, I would like to avoid creating the BigQuery table twice.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...