Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
671 views
in Technique[技术] by (71.8m points)

hive - SQL most recent using row_number() over partition

I'm working with some web clicks data, and am just looking for the most recent page_name with the user_id visited (by a timestamp). Using the below code, the user_id is repeated and page_name with shown, with sorted descending. However, I would just like recent_click always = 1. The query when complete will be used as a subquery in a larger query.

Here is my current code:

 SELECT user_id,
 page_name,
 row_number() over(partition by session_id order by ts desc) as recent_click
 from clicks_data;

 user_id |  page_name  |  recent_click
 --------+-------------+--------------
 0001    |  login      |  1
 0001    |  login      |  2
 0002    |  home       |  1
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You should be able to move your query to a subquery and add where criteria:

SELECT user_id, page_name, recent_click
FROM (
  SELECT user_id,
         page_name,
         row_number() over (partition by session_id order by ts desc) as recent_click
  from clicks_data
) T
WHERE recent_click = 1

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...