Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
350 views
in Technique[技术] by (71.8m points)

Spark SQL explode referencing

It is not clear to me how can you refer to the exploded column in the same subquery, and I am not sure what to search for to get more explanation.

SELECT explode(kit) exploded,
    exploded [0]
FROM tabla

When we are referring from the outer query, it is super clear because the exploded column already exists. What am I missing?

select exploded [0] from
(SELECT explode(kit) exploded
FROM tabla)

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The physical plans of these two queries, and also the lateral view query, are identical. Apparently, the analyzed logical plan of the first query is identical to the lateral view query. After optimization, the logical plans of all three queries became identical.


Create dataframe:

df = spark.range(1).selectExpr("array(array(1,2),array(3,4)) kit")

First query:

spark.sql('select explode(kit) exploded, exploded[0] from tabla').explain(True)
== Parsed Logical Plan ==
'Project ['explode('kit) AS exploded#308, unresolvedalias('exploded[0], None)]
+- 'UnresolvedRelation [tabla]

== Analyzed Logical Plan ==
exploded: array<int>, exploded[0]: int
Project [exploded#309, exploded#309[0] AS exploded[0]#310]
+- Generate explode(kit#292), false, [exploded#309]
   +- SubqueryAlias tabla
      +- Project [array(array(1, 2), array(3, 4)) AS kit#292]
         +- Range (0, 1, step=1, splits=Some(24))

== Optimized Logical Plan ==
Project [exploded#309, exploded#309[0] AS exploded[0]#310]
+- Generate explode([[1,2],[3,4]]), [0], false, [exploded#309]
   +- Project [[[1,2],[3,4]] AS kit#292]
      +- Range (0, 1, step=1, splits=Some(24))

== Physical Plan ==
*(2) Project [exploded#309, exploded#309[0] AS exploded[0]#310]
+- Generate explode([[1,2],[3,4]]), false, [exploded#309]
   +- *(1) Project [[[1,2],[3,4]] AS kit#292]
      +- *(1) Range (0, 1, step=1, splits=24)

Second query: using subquery

spark.sql('select exploded[0] from (select explode(kit) exploded from tabla)').explain(True)
== Parsed Logical Plan ==
'Project [unresolvedalias('exploded[0], None)]
+- 'SubqueryAlias __auto_generated_subquery_name
   +- 'Project ['explode('kit) AS exploded#313]
      +- 'UnresolvedRelation [tabla]

== Analyzed Logical Plan ==
exploded[0]: int
Project [exploded#314[0] AS exploded[0]#315]
+- SubqueryAlias __auto_generated_subquery_name
   +- Project [exploded#314]
      +- Generate explode(kit#292), false, [exploded#314]
         +- SubqueryAlias tabla
            +- Project [array(array(1, 2), array(3, 4)) AS kit#292]
               +- Range (0, 1, step=1, splits=Some(24))

== Optimized Logical Plan ==
Project [exploded#314[0] AS exploded[0]#315]
+- Generate explode([[1,2],[3,4]]), [0], false, [exploded#314]
   +- Project [[[1,2],[3,4]] AS kit#292]
      +- Range (0, 1, step=1, splits=Some(24))

== Physical Plan ==
*(2) Project [exploded#314[0] AS exploded[0]#315]
+- Generate explode([[1,2],[3,4]]), false, [exploded#314]
   +- *(1) Project [[[1,2],[3,4]] AS kit#292]
      +- *(1) Range (0, 1, step=1, splits=24)

Third query: using lateral view

spark.sql('select exploded[0] from tabla lateral view explode(kit) as exploded').explain(True)
== Parsed Logical Plan ==
'Project [unresolvedalias('exploded[0], None)]
+- 'Generate 'explode('kit), false, as, ['exploded]
   +- 'UnresolvedRelation [tabla]

== Analyzed Logical Plan ==
exploded[0]: int
Project [exploded#353[0] AS exploded[0]#354]
+- Generate explode(kit#292), false, as, [exploded#353]
   +- SubqueryAlias tabla
      +- Project [array(array(1, 2), array(3, 4)) AS kit#292]
         +- Range (0, 1, step=1, splits=Some(24))

== Optimized Logical Plan ==
Project [exploded#353[0] AS exploded[0]#354]
+- Generate explode([[1,2],[3,4]]), [0], false, as, [exploded#353]
   +- Project [[[1,2],[3,4]] AS kit#292]
      +- Range (0, 1, step=1, splits=Some(24))

== Physical Plan ==
*(2) Project [exploded#353[0] AS exploded[0]#354]
+- Generate explode([[1,2],[3,4]]), false, [exploded#353]
   +- *(1) Project [[[1,2],[3,4]] AS kit#292]
      +- *(1) Range (0, 1, step=1, splits=24)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...