![]() This example is also available at Spark Scala GitHub Project for reference. ![]() Val df2 = spark.createDataFrame((data2),schema)ĭf2.select("name.firstname","name.lastname").show(false) If you are new to Spark and you have not learned StructType yet, I would recommend skipping the rest of the section or first Understand Spark StructType before you proceed.įirst, let’s create a new DataFrame with a nested struct type. When you are processing structured data, most of the time you will have Spark DataFrame columns nested with struct type (StructType), you need to use an explicit column qualifier in order to select. #Spark post how to#The below example shows all columns that contains name string.ĭf.select(df.colRegex("`^.*name*`")).show()īelow are some examples of how to select DataFrame columns by starts with and ends with a string.ĭf.select(df.columns.filter(f=>f.startsWith("first")).map(m=>col(m)):_*)ĭf.select(df.columns.filter(f=>f.endsWith("name")).map(m=>col(m)):_*) You can use df.colRegex() to select columns based on a regular expression. Selects 4th column (index starts from zero)ĭf.select(df.columns.slice(2,4).map(m=>col(m)):_*).show() To select a column based out of position or index, first get all columns using df.columns and get the column name from index, also use slice() to get column names from start and end positions. Below snippet select first 3 columns.ĭf.select(df.columns.slice(0,3).map(m=>col(m)):_*).show() In order to select first N columns, you can use the df.columns to get all the columns on DataFrame and use the slice() method to select the first n columns. ![]() Some times you may have to select column names from an Array, List or Seq of String, below example provides snippet how to do this using list.ĭf.select(listCols.map(m=>col(m)):_*).show() Import .lĭf.select(col("firstname").alias("fname"),col("lastname")).show()īelow are different ways to get all columns of Spark DataFrame, here we use df.columns to get all columns on a DataFrame as Array, convert it to Array using scala map() and finally use it on select().ĭf.select(columns.map(m=>col(m)):_*).show() Using col function, use alias() to get alias name Df.select(df("firstname"),df("lastname")).show() ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |