How to Transpose Spark/PySpark DataFrame

Nikhil Suthar
2 min readMar 31, 2020

--

What is Transpose?

The transpose of a Dataframe is a new DataFrame whose rows are the columns of the original DataFrame. (This makes the columns of the new DataFrame the rows of the original).

Python Panda library provides a built-in transpose function. But when we talk about spark scala then there is no pre-defined function that can transpose spark dataframe.

Spark DataFrame Transpose

Transpose in Spark (Scala)

We have written below a generic transpose method (named as TransposeDF) that can use to transpose spark dataframe. Click here to get complete details of the method.

This method takes three parameters.

  1. The first parameter is the Input DataFrame.
  2. The Second parameter is all column sequences except pivot columns.
  3. The third parameter is the pivot columns.

What is the pivot column that you can understand with the below example.

Let’s take one spark DataFrame that we will transpose into another dataFrame using the above TransposeDF method.

productQtyDF is a dataFrame that contains information about quantity as per products. Let's transpose productQtyDF DataFrame into productTypeDF DataFrame by using the method TransposeDF which will give us information about Quantity as per its type.

The Pivot column in the above example will be Products. Let's call the methodTransposeDF.

Transpose in PySpark

We can use same Transpose method with PySpark DataFrame also. To use this method in PySpark, us below method.

To use in the code,

All the parameters and value will be the same as the method in Scala.

--

--