Apache Spark/Apache Spark Sample Test,Sample questions

Question:
Apache Spark supports –

 
 

1.Batch processing

2.Stream processing

3. Graph processing

4.All of the above

Posted Date:-2022-04-06 13:02:04


Question:
Can you combine the libraries of Apache Spark into the same Application, for example, MLlib, GraphX, SQL and DataFrames etc.

1.yes

2.no

3.none

4.None of These

Posted Date:-2022-04-06 12:02:14


Question:
FlatMap transforms an RDD of length N into another RDD of length M. which of the following is true for N and M.
a. N>M

b. N<M

c. N<=M

1. Either a or b

2.Either b or c

3.Either a or c

4.None of the above

Posted Date:-2022-04-06 13:04:12


Question:
FlatMap transforms an RDD of length N into another RDD of length M. which of the following is true for N and M.
a. N>M

b. N<M

c. N<=M

1. Either a or b

2.Either b or c

3.Either a or c

4.None of the above

Posted Date:-2022-04-06 13:10:56


Question:
For Multiclass classification problem which algorithm is not the solution?

1.Naive Bayes

2.Random Forests

3. Logistic Regression

4.Decision Trees

Posted Date:-2022-04-06 12:17:02


Question:
For Regression problem which algorithm is not the solution?

 
 

1.Logistic Regression

2.Ridge Regression

3.Decision Trees

4.Gradient-Boosted Trees

Posted Date:-2022-04-06 12:18:05


Question:
How many Spark Context can be active per JVM?

 
 
 

1.More than one

2.Only one

3.Not specific

4.None of the above

Posted Date:-2022-04-06 12:06:34


Question:
How many tasks does Spark run on each partition?

1. Any number of task

2.one

3.More than one less than five

4.None of These

Posted Date:-2022-04-06 12:08:44


Question:
How much faster can Apache Spark potentially run batch-processing programs when processed in memory than MapReduce can?

1. 10 times faster

2.20 times faster

3. 100 times faster

4.200 times faster

Posted Date:-2022-04-06 11:55:57


Question:
In how many ways RDD can be created?

1.4

2.3

3.2

4.1

Posted Date:-2022-04-06 12:07:11


Question:
In which of the following Action the result is not returned to the driver.

 

1.collect()

2. top()

3.countByValue()

4.foreach()

Posted Date:-2022-04-06 13:13:48


Question:
In which of the following cases do we keep the data in-memory?

1. Iterative algorithms

2. Interactive data mining tools

3. Both the above

4.None of These

Posted Date:-2022-04-06 12:30:35


Question:
The shortcomings of Hadoop MapReduce was overcome by Spark RDD by

 

 All of the above

1.Lazy-evaluation

2.DAG

3. In-memory processing

4.All of the above

Posted Date:-2022-04-06 12:39:39


Question:
The write operation on RDD is

 

1. Fine-grained

2. Coarse-grained

3. Either fine-grained or coarse-grained

4. Neither fine-grained nor coarse-grained

Posted Date:-2022-04-06 12:33:21


Question:
What are the features of Spark RDD?

 

1.In-memory computation

2. Lazy evaluations

3.Fault Tolerance

4.All of the above

Posted Date:-2022-04-06 12:05:40


Question:
What is action in Spark RDD?

 
 

1.The ways to send result from executors to the driver

2.Takes RDD as input and produces one or more RDD as output.

3.Creates one or many new RDDs

4.All of the above

Posted Date:-2022-04-06 12:36:38


Question:
When does Apache Spark evaluate RDD?

  
 

1.Upon action

2.Upon transformation

3.On both transformation and action

4.None of the above

Posted Date:-2022-04-06 12:31:48


Question:
Which of the following is a tool of Machine Learning Library?

 

1.Persistence

2. Utilities like linear algebra, statistics

3.Pipelines

4.All of the above

Posted Date:-2022-04-06 12:20:04


Question:
Which of the following is a transformation?

 

1.take(n)

2.top()

3. countByValue()

4.mapPartitionWithIndex()

Posted Date:-2022-04-06 13:11:35


Question:
Which of the following is action?

 
  

1.Union(dataset)

2.Intersection(other-dataset)

3.Distinct()

4.CountByValue()

Posted Date:-2022-04-06 13:12:57


Question:
Which of the following is false for Apache Spark?

1. It provides high-level API in Java, Python, R, Scala

2. It can be integrated with Hadoop and can process existing Hadoop HDFS data

3.Spark is an open source framework which is written in Java

4.Spark is 100 times faster than Bigdata Hadoop

Posted Date:-2022-04-06 12:21:50


Question:
Which of the following is not a function of Spark Context in Apache Spark?

1. Entry point to Spark SQL

2.To Access various services

3.To set the configuration

4.To get the current status of Spark Application

Posted Date:-2022-04-06 12:04:49


Question:
Which of the following is not a transformation?

 
 
 
 

1.Flatmap

2.Map

3.Reduce

4.Filter

Posted Date:-2022-04-06 12:09:33


Question:
Which of the following is not an action?

 

1.collect()

2.take(n)

3.top()

4.map

Posted Date:-2022-04-06 12:10:45


Question:
Which of the following is not true for map() Operation?

1.Map transforms an RDD of length N into another RDD of length N.

2. In the Map operation developer can define his own custom business logic.

3. It applies to each element of RDD and it returns the result as new RDD

4.Map allows returning 0, 1 or more elements from map function.

Posted Date:-2022-04-06 13:03:12


Question:
Which of the following is open-source?

1. Apache Spark

2.Apache Hadoop

3.Apache Flink

4.All of the above

Posted Date:-2022-04-06 13:01:05


Question:
Which of the following is the entry point of Spark Application –

 
 

1.SparkSession

2.SparkContext

3. None of the both

4.Only 1

Posted Date:-2022-04-06 12:59:04


Question:
Which of the following is the entry point of Spark SQL?

 

1.SparkSession

2. SparkContext

3.Both 1 and 2

4.None

Posted Date:-2022-04-06 13:00:10


Question:
Which of the following is the reason for Spark being Speedy than MapReduce?

1. DAG execution engine and in-memory computation

2.Support for different language APIs like Scala, Java, Python and R

3.RDDs are immutable and fault-tolerant

4.None of the above

Posted Date:-2022-04-06 12:00:24


Question:
Which of the following is true about DataFrame?

 
 
 

1.Data Frames provide a more user-friendly API than RDDs.

2.Data Frame API have provision for compile-time type safety

3.Both the above

4.None of the above

Posted Date:-2022-04-06 12:19:11


Question:
Which of the following is true about narrow transformation –


 

1. The data required to compute resides on multiple partitions.

2.The data required to compute resides on the single partition.

3. Both the above

4.None

Posted Date:-2022-04-06 12:37:50


Question:
Which of the following is true about wide transformation –


 

1. The data required to compute resides on multiple partitions.

2. The data required to compute resides on the single partition.

3.Both 1 and 2

4.None of the both

Posted Date:-2022-04-06 12:38:47


Question:
Which of the following is true for RDD?

 
 
 
 None of the above

1.RDD is programming paradigm

2.RDD in Apache Spark is an immutable collection of objects

3.It is database

4.None of the above

Posted Date:-2022-04-06 12:13:03


Question:
Which of the following is true for RDD?

 
 
 None of the above

1.RDD is programming paradigm

2.RDD in Apache Spark is an immutable collection of objects

3.It is database

4.None of the above

Posted Date:-2022-04-06 12:03:46


Question:
Which of the following is true for RDD?

 We can operate Spark RDDs in parallel with a low-level API
 
 
 

1. We can operate Spark RDDs in parallel with a low-level API

2. RDDs are similar to the table in a relational database

3. It allows processing of a large amount of structured data

4.It has built-in optimization engine

Posted Date:-2022-04-06 12:29:09


Question:
Which of the following is true for Spark core?


 
 
 

1. It is the kernel of Spark

2.It enables users to run SQL / HQL queries on the top of Spark.

3.It is the scalable machine learning library which delivers efficiencies

4.Improves the performance of iterative algorithm drastically.

Posted Date:-2022-04-06 12:23:53


Question:
Which of the following is true for Spark MLlib?

 

1.Provides an execution platform for all the Spark application

2. It is the scalable machine learning library which delivers efficiencies

3. enables powerful interactive and data analytics application across live streaming data

4.All of the above

Posted Date:-2022-04-06 12:26:00


Question:
Which of the following is true for Spark R?


 

 

1. It allows data scientists to analyze large datasets and interactively run jobs

2.It is the kernel of Spark

3. It is the scalable machine learning library which delivers efficiencies

4.It enables users to run SQL / HQL queries on the top of Spark.

Posted Date:-2022-04-06 12:24:58


Question:
Which of the following is true for Spark Shell?

 

1.It helps Spark applications to easily run on the command line of the system

2.It runs/tests application code interactively

3.It allows reading from many types of data sources

4.All of the above

Posted Date:-2022-04-06 12:27:13


Question:
Which of the following is true for Spark SQL?

1. It is the kernel of Spark

2. Provides an execution platform for all the Spark applications

3. It enables users to run SQL / HQL queries on the top of Spark.

4.It enables users to run SQL / HQL queries on the top of Spark.

Posted Date:-2022-04-06 12:22:48


Question:
Which of the following provide the Spark Core’s fast scheduling capability to perform streaming analytics.

1.RDD

2.GraphX

3.Spark Streaming

4. Spark R

Posted Date:-2022-04-06 11:58:02


Question:
You can connect R program to a Spark cluster from –

 
 
 
 

1.RStudio

2.R Shell

3.Rscript

4.All of the above

Posted Date:-2022-04-06 12:11:42


More MCQS

  1. Apache Spark
Search
R4R Team
R4Rin Top Tutorials are Core Java,Hibernate ,Spring,Sturts.The content on R4R.in website is done by expert team not only with the help of books but along with the strong professional knowledge in all context like coding,designing, marketing,etc!