Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
564 views
in Technique[技术] by (71.8m points)

sql - Bigquery Lag navigation function ignore nulls

I have a table as below:

WITH
  table1 AS (
  SELECT
    TIMESTAMP '2016-10-18 2:51:45' AS ts,
    'AB4' AS packId,
    FALSE AS stat,
    'x' AS position
  UNION ALL
  SELECT
    TIMESTAMP '2016-10-18 2:54:11',
    'AB4',
    FALSE,
    'x'
  UNION ALL
  SELECT
    TIMESTAMP '2016-10-18 3:01:17',
    'AB4',
    FALSE,
    'x'
  UNION ALL
  SELECT
    TIMESTAMP '2016-10-18 3:05:42',
    'AB4',
    TRUE,
    NULL
  UNION ALL
  SELECT
    TIMESTAMP '2016-10-18 3:06:24',
    'AB4',
    FALSE,
    'x'
  UNION ALL
  SELECT
    TIMESTAMP '2016-10-18 3:06:36',
    'AB4',
    TRUE,
    'y'
  UNION ALL
  SELECT
    TIMESTAMP '2016-10-18 3:07:41',
    'AB4',
    FALSE,
    'x')
SELECT
  ts,
  packId,
  stat,
  position,
  LAG(position) OVER prev_pos AS previous_position1,
  LAG(position,2) OVER prev_pos AS previous_position2
FROM
  table1
WINDOW
  prev_pos AS (
  PARTITION BY
    packId
  ORDER BY
    ts)

And results look like this:

enter image description here

I would like to get previous value (from column position) which is not null , as shown in the result table, I want those null values be the previous non-null which is 'x' for both of them in this example. (There are thousands of rows with null positions so last non-null position for current row could be thousand rows before it )

In summary: how to ignore nulls for Lag in bigquery?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Below is for BigQuery Standard SQL

WITH table1 AS (
  SELECT TIMESTAMP '2016-10-18 2:51:45' AS ts, 'AB4' AS packId, FALSE AS stat, 'x3' AS position UNION ALL
  SELECT TIMESTAMP '2016-10-18 2:54:11', 'AB4', FALSE, 'x2' UNION ALL
  SELECT TIMESTAMP '2016-10-18 3:01:17', 'AB4', FALSE, 'x1' UNION ALL
  SELECT TIMESTAMP '2016-10-18 3:05:42', 'AB4', TRUE, NULL UNION ALL
  SELECT TIMESTAMP '2016-10-18 3:06:24', 'AB4', FALSE, 'x' UNION ALL
  SELECT TIMESTAMP '2016-10-18 3:06:36', 'AB4', TRUE, 'y' UNION ALL
  SELECT TIMESTAMP '2016-10-18 3:07:41', 'AB4', FALSE, 'x'
)
SELECT
  ts,
  packId,
  stat,
  position,
  NTH_VALUE(position, 1 IGNORE NULLS) OVER prev_pos AS previous_position1,
  NTH_VALUE(position, 2 IGNORE NULLS) OVER prev_pos AS previous_position2
FROM  table1
WINDOW prev_pos AS 
  (PARTITION BY packId ORDER BY ts DESC ROWS BETWEEN 1 FOLLOWING and UNBOUNDED FOLLOWING)
-- ORDER BY ts     

with output

enter image description here

As you can see no subqueries needed - and the ONLY change to your original query is

  1. replacing LAG(position) and LAG(position,2) with respectively - NTH_VALUE(position, 1 IGNORE NULLS) and NTH_VALUE(position, 2 IGNORE NULLS) and

  2. adding proper framing clause to prev_pos window


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...