The physical plans of these two queries, and also the lateral view query, are identical. Apparently, the analyzed logical plan of the first query is identical to the lateral view query. After optimization, the logical plans of all three queries became identical.
Create dataframe:
df = spark.range(1).selectExpr("array(array(1,2),array(3,4)) kit")
First query:
spark.sql('select explode(kit) exploded, exploded[0] from tabla').explain(True)
== Parsed Logical Plan ==
'Project ['explode('kit) AS exploded#308, unresolvedalias('exploded[0], None)]
+- 'UnresolvedRelation [tabla]
== Analyzed Logical Plan ==
exploded: array<int>, exploded[0]: int
Project [exploded#309, exploded#309[0] AS exploded[0]#310]
+- Generate explode(kit#292), false, [exploded#309]
+- SubqueryAlias tabla
+- Project [array(array(1, 2), array(3, 4)) AS kit#292]
+- Range (0, 1, step=1, splits=Some(24))
== Optimized Logical Plan ==
Project [exploded#309, exploded#309[0] AS exploded[0]#310]
+- Generate explode([[1,2],[3,4]]), [0], false, [exploded#309]
+- Project [[[1,2],[3,4]] AS kit#292]
+- Range (0, 1, step=1, splits=Some(24))
== Physical Plan ==
*(2) Project [exploded#309, exploded#309[0] AS exploded[0]#310]
+- Generate explode([[1,2],[3,4]]), false, [exploded#309]
+- *(1) Project [[[1,2],[3,4]] AS kit#292]
+- *(1) Range (0, 1, step=1, splits=24)
Second query: using subquery
spark.sql('select exploded[0] from (select explode(kit) exploded from tabla)').explain(True)
== Parsed Logical Plan ==
'Project [unresolvedalias('exploded[0], None)]
+- 'SubqueryAlias __auto_generated_subquery_name
+- 'Project ['explode('kit) AS exploded#313]
+- 'UnresolvedRelation [tabla]
== Analyzed Logical Plan ==
exploded[0]: int
Project [exploded#314[0] AS exploded[0]#315]
+- SubqueryAlias __auto_generated_subquery_name
+- Project [exploded#314]
+- Generate explode(kit#292), false, [exploded#314]
+- SubqueryAlias tabla
+- Project [array(array(1, 2), array(3, 4)) AS kit#292]
+- Range (0, 1, step=1, splits=Some(24))
== Optimized Logical Plan ==
Project [exploded#314[0] AS exploded[0]#315]
+- Generate explode([[1,2],[3,4]]), [0], false, [exploded#314]
+- Project [[[1,2],[3,4]] AS kit#292]
+- Range (0, 1, step=1, splits=Some(24))
== Physical Plan ==
*(2) Project [exploded#314[0] AS exploded[0]#315]
+- Generate explode([[1,2],[3,4]]), false, [exploded#314]
+- *(1) Project [[[1,2],[3,4]] AS kit#292]
+- *(1) Range (0, 1, step=1, splits=24)
Third query: using lateral view
spark.sql('select exploded[0] from tabla lateral view explode(kit) as exploded').explain(True)
== Parsed Logical Plan ==
'Project [unresolvedalias('exploded[0], None)]
+- 'Generate 'explode('kit), false, as, ['exploded]
+- 'UnresolvedRelation [tabla]
== Analyzed Logical Plan ==
exploded[0]: int
Project [exploded#353[0] AS exploded[0]#354]
+- Generate explode(kit#292), false, as, [exploded#353]
+- SubqueryAlias tabla
+- Project [array(array(1, 2), array(3, 4)) AS kit#292]
+- Range (0, 1, step=1, splits=Some(24))
== Optimized Logical Plan ==
Project [exploded#353[0] AS exploded[0]#354]
+- Generate explode([[1,2],[3,4]]), [0], false, as, [exploded#353]
+- Project [[[1,2],[3,4]] AS kit#292]
+- Range (0, 1, step=1, splits=Some(24))
== Physical Plan ==
*(2) Project [exploded#353[0] AS exploded[0]#354]
+- Generate explode([[1,2],[3,4]]), false, [exploded#353]
+- *(1) Project [[[1,2],[3,4]] AS kit#292]
+- *(1) Range (0, 1, step=1, splits=24)