spark
很多代码来自 Spark权威指南
pyspark使用ipython3
export PYSPARK_DRIVER_PYTHON=ipython3explain 查看执行命令,类似mysql
In [1]: df = spark.range(10).toDF("N")
In [2]: df.explain()
== Physical Plan ==
*(1) Project [id#0L AS N#2L]
+- *(1) Range (0, 10, step=1, splits=8)取某列的最大值
from pyspark.sql.functions import max
flightData2015.select(max("count")).take(1)group by 并求和
Last updated
Was this helpful?