Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
747 views
in Technique[技术] by (71.8m points)

sql - Store multiple elements in json files in AWS Athena

I have some json files stored in a S3 bucket , where each file has multiple elements of same structure. For example,

[{"eventId":"1","eventName":"INSERT","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"New item!","Id":101}},{"eventId":"2","eventName":"MODIFY","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"This item has changed","Id":101}},{"eventId":"3","eventName":"REMOVE","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"This item has changed","Id":101}}]

I want to create a table in Athena corresponding to above data.

The query I wrote for creating the table:

CREATE EXTERNAL TABLE IF NOT EXISTS sampledb.elb_logs2 (
  `eventId` string,
  `eventName` string,
  `eventVersion` string,
  `eventSource` string,
  `awsRegion` string,
  `image` map<string,string> 
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1',
  'field.delim' = ' '
) LOCATION 's3://<bucketname>/';

But if I do a SELECT query as follows,

SELECT * FROM sampledb.elb_logs4;

I get the following result:

1   {"eventid":"1","eventversion":"1.0","image":{"id":"101","message":"New item!"},"eventsource":"aws:dynamodb","eventname":"INSERT","awsregion":"us-west-2"}   {"eventid":"2","eventversion":"1.0","image":{"id":"101","message":"This item has changed"},"eventsource":"aws:dynamodb","eventname":"MODIFY","awsregion":"us-west-2"}   {"eventid":"3","eventversion":"1.0","image":{"id":"101","message":"This item has changed"},"eventsource":"aws:dynamodb","eventname":"REMOVE","awsregion":"us-west-2"}   

The entire content of the json file is picked as one entry here.

How can I read each element of json file as one entry?

Edit: How can I read each subcolumn of image, i.e., each element of the map?

Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Question1: Store multiple elements in json files for AWS Athena

I need to rewrite my json file as

{"eventId":"1","eventName":"INSERT","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"New item!","Id":101}}, {"eventId":"2","eventName":"MODIFY","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"This item has changed","Id":101}}, {"eventId":"3","eventName":"REMOVE","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"This item has changed","Id":101}}

That means

Remove the square brackets [ ] Keep each element in one line

{.....................}
{.....................}
{.....................}

Question2. Access nonlinear json attributes

CREATE EXTERNAL TABLE IF NOT EXISTS <tablename> (
  `eventId` string,
  `eventName` string,
  `eventVersion` string,
  `eventSource` string,
  `awsRegion` string,
  `image` struct <`Id` : string,
                  `Message` : string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1',
   "dots.in.keys" = "true"
) LOCATION 's3://exampletablewithstream-us-west-2/';

Query:

select image.Id, image.message from <tablename>;

Ref:

http://engineering.skybettingandgaming.com/2015/01/20/parsing-json-in-hive/

https://github.com/rcongiu/Hive-JSON-Serde#mapping-hive-keywords


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...