How to Transpose Spark/PySpark DataFrame
What is Transpose?
The transpose of a
Dataframe
is a newDataFrame
whose rows are the columns of the original DataFrame. (This makes the columns of the new DataFrame the rows of the original).
Python Panda library provides a built-in transpose
function. But when we talk about spark scala then there is no pre-defined function that can transpose spark dataframe.
Transpose in Spark (Scala)
We have written below a generic transpose method (named as TransposeDF
) that can use to transpose spark dataframe. Click here to get complete details of the method.
This method takes three parameters.
- The first parameter is the Input DataFrame.
- The Second parameter is all column sequences except pivot columns.
- The third parameter is the pivot columns.
What is the pivot column that you can understand with the below example.
Let’s take one spark DataFrame that we will transpose into another dataFrame using the above TransposeDF method.
productQtyDF
is a dataFrame that contains information about quantity as per products. Let's transpose productQtyDF
DataFrame into productTypeDF
DataFrame by using the method TransposeDF
which will give us information about Quantity as per its type.
The Pivot column in the above example will be Products
. Let's call the methodTransposeDF
.
Transpose in PySpark
We can use same Transpose method with PySpark DataFrame also. To use this method in PySpark, us below method.
To use in the code,
All the parameters and value will be the same as the method in Scala.