Member-only story

How to Transpose Spark/PySpark DataFrame

Published in

The Art of Data Engineering

2 min readMar 31, 2020

What is Transpose?
The transpose of a Dataframe is a new DataFrame whose rows are the columns of the original DataFrame. (This makes the columns of the new DataFrame the rows of the original).

Python Panda library provides a built-in transpose function. But when we talk about spark scala then there is no pre-defined function that can transpose spark dataframe.

Transpose in Spark (Scala)

We have written below a generic transpose method (named as TransposeDF) that can use to transpose spark dataframe. Click here to get complete details of the method.

This method takes three parameters.

The first parameter is the Input DataFrame.
The Second parameter is all column sequences except pivot columns.
The third parameter is the pivot columns.

What is the pivot column that you can understand with the below example.

Let’s take one spark DataFrame that we will transpose into another dataFrame using the above TransposeDF method.

productQtyDF is a dataFrame that contains information about quantity as per products. Let's transpose productQtyDF DataFrame into productTypeDF DataFrame by using the method TransposeDF which will give us information about Quantity as per its type.

The Pivot column in the above example will be Products. Let's call the methodTransposeDF.

The Art of Data Engineering

How to Transpose Spark/PySpark DataFrame

Transpose in Spark (Scala)

Published in The Art of Data Engineering

Written by Nikhil Suthar

Responses (4)