Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
862 views
in Technique[技术] by (71.8m points)

amazon web services - Redshift. Convert comma delimited values into rows

I am wondering how to convert comma-delimited values into rows in Redshift. I am afraid that my own solution isn't optimal. Please advise. I have table with one of the columns with coma-separated values. For example:

I have:

user_id|user_name|user_action
-----------------------------
1      | Shone   | start,stop,cancell...

I would like to see

user_id|user_name|parsed_action 
------------------------------- 
1      | Shone   | start        
1      | Shone   | stop         
1      | Shone   | cancell      
....
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

A slight improvement over the existing answer is to use a second "numbers" table that enumerates all of the possible list lengths and then use a cross join to make the query more compact.

Redshift does not have a straightforward method for creating a numbers table that I am aware of, but we can use a bit of a hack from https://www.periscope.io/blog/generate-series-in-redshift-and-mysql.html to create one using row numbers.

Specifically, if we assume the number of rows in cmd_logs is larger than the maximum number of commas in the user_action column, we can create a numbers table by counting rows. To start, let's assume there are at most 99 commas in the user_action column:

select 
  (row_number() over (order by true))::int as n
into numbers
from cmd_logs
limit 100;

If we want to get fancy, we can compute the number of commas from the cmd_logs table to create a more precise set of rows in numbers:

select
  n::int
into numbers
from
  (select 
      row_number() over (order by true) as n
   from cmd_logs)
cross join
  (select 
      max(regexp_count(user_action, '[,]')) as max_num 
   from cmd_logs)
where
  n <= max_num + 1;

Once there is a numbers table, we can do:

select
  user_id, 
  user_name, 
  split_part(user_action,',',n) as parsed_action 
from
  cmd_logs
cross join
  numbers
where
  split_part(user_action,',',n) is not null
  and split_part(user_action,',',n) != '';

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...