COSC 3365 - Homework 9

Ace your homework & exams now with Quizwiz!

Spark cannot infer the DataFrame schema from the following type of files if they are formed properly:

.xsls

Which of the followings is not true for DataFrames?

Are very similar to document-based Databases

Column operations in Spark include:

Arithmetic operations

Catalyst Optimizer is an extensible framework for SparkSQL which optimizes query results in a Dataset or Dataframe.

False

Column operations in Spark include arithmetic operations, but do not have data testing operations and sorting functions.

False

CrowdStrike company uses Apache Spark to store the data in the cloud.

False

Datasets are a collection of objects, whose type is defined at the time of execution.

False

GraphX is a framework used in Spark to make visualizations of the data.

False

In DataFrames, each row can contain a collection of values of specific types such as integers, floats, strings but not a collection of arrays or lists.

False

Spark Streaming is a Spark API that can be used to create applications that stream data into the cloud.

False

Spark is written in Java and can be used by coding in Scala or Python.

False

What does "left_outer" join in the command "customersDF.join(zipCodeDF, customersDF("City")===zipCodeDF("City"), "left_outer").show()" do?

Lists all customers and adds the zip code to those customers who have a non-null value for the City

Which of the following is not a main component in Big Data Analysis?

MATLAB

By using DataFrame actions you cannot:

Select columns dynamically according to a condition

What is the result after submitting the following command: dataframe.select("col1", "col3").where("col1>0").take(3).show() ?

Show col1 and col3 of first 3 rows where col1 is positive

Which of the following commands cannot be used to get the first row of the DataFrame

Show(1)

Which of the followings is not a Spark API?

Spark Core

Which of the following is the main entry point of Spark for working with structured data?

SparkSQL

DataFrames actions generate a new output and transformations that transform an existing DataFrame.

True

In Python there is an additional way to select columns of a DataFrame like object

True

In Python, DataFrames may be called similarly as calling an object like this: DF.field

True

In Python, you cannot use Datasets, as it is a dynamically typed language.

True

In Spark, you can generate a DataFrame from a custom list and then rename all the columns to define your own schema

True

Scala is a functional Programming language that runs in Java Virtual Machine

True

Spark Applications can perform large scale data processing such as extract, transform and load (ETL)

True

Spark has been gaining ground on MapReduce because of its faster processing, lower latency, and data streaming abilities.

True

SparkMLib is a machine learning library that allows making Spark Big Data learning applications

True

The Spark Core contains the basic functionality of Spark, including components for task scheduling, memory management, fault recovery, and interacting with storage systems.

True

What is the correct command to read all json files in a given directory?

spark.read.json("directory/*.json")

Which of the following is not a DataFrame transformation?

collect


Related study sets

Chapter 18: The Circulatory System: Blood

View Set

Module 1: Introduction to Information and Communication Technology

View Set

(Quiz Questions) Chapter 2 - Intro to Networking

View Set