Databricks interview questions for freshers || Databricks Interview Questions and Answers for Freshers & Experienced

List the stages of a CI/CD pipeline?

There are four stages of a CI/CD pipeline they are:

1. Source
2. Build
3. Staging
4. Production

Posted Date:- 2021-09-25 05:40:48

What are the things involved when pushing the data pipeline to a staging environment?

The four things involved when pushing the data pipeline to a staging environment are:

1. Notebooks
2. Libraries
3. Clusters and Jobs configuration
4. Results

Posted Date:- 2021-09-25 05:39:17

What do clusters do at the network level?

At the network, level clusters try to connect with the control panel proxy throughout the cluster reaction.

Posted Date:- 2021-09-25 05:38:12

What are the different elements to specify within the JSON request body while replacing an IP access list?

The different elements to specify within JSON request body while replacing an IP access list are:

1. label
2. list_type
3. ip_addresses
4. enabled

Posted Date:- 2021-09-25 05:37:25

Write the syntax to delete the IP access list?

The syntax to delete the IP access list is:

DELETE /ip-access-lists/

Posted Date:- 2021-09-25 05:34:49

What are the rules to name a secret scope?

There are three main rules to name a secret scope they are:

1. A secret scope name must contain underscores, periods, alphanumeric characters, and dashes.
2. The name must not exceed 128 characters.
3. The name must be unique in the workspace.

Posted Date:- 2021-09-25 05:33:18

What are the two types of secret scopes?

There are two types of secret scopes they are:

1. Databricks-backed scopes.
2. Azure key Vault-backed scopes.

Posted Date:- 2021-09-25 05:32:03

How to delete a Secret?

We can use Azure Portal UI or Azure SetSecret Rest API to delete a Secret from any scope that is backed by an Azure key vault.

Posted Date:- 2021-09-25 05:30:47

What is the use of Secrets utility?

Secrets utility is used to read the secrets in the job or notebooks.

Posted Date:- 2021-09-25 05:30:04

Write a syntax to list secrets in a specific scope?

The syntax to list secrets in a specific scope is:

databricks secrets list â€“scope

Posted Date:- 2021-09-25 05:29:36

What is a secret in databricks?

A secret is a key-value pair that stocks up the secret material; it consists of a unique key name inside a secret scope. The limit of each scope is up to 1000 secrets. The maximum size of the secret value is 128 KB.

Posted Date:- 2021-09-25 05:29:02

What is the use of widgets in databricks?

Widgets enable us to add parameters to our dashboards and notebooks. The API widget consists of calls to generate multiple input widgets, get the bound values and remove them.

Posted Date:- 2021-09-25 05:28:26

What is the use of %run?

The %run command is used to parameterize a databricks notebook. %run is also used to modularize the code.

Posted Date:- 2021-09-25 05:27:10

What is the use of %run?

The %run command is used to parameterize a databricks notebook. %run is also used to modularize the code.

Posted Date:- 2021-09-25 05:27:09

What are the critical challenges for CI/CD while building a data pipeline?

The five critical challenges for CI/CD while building a data pipeline are:

1. Pushing the data pipeline to the environment of production.
2. Exploration of data.
3. Pushing the data pipeline to the staging environment.
4. Developing the unit tests iteratively.
5. Constant build and Integration.

Posted Date:- 2021-09-25 05:26:38

What is a CD(Continuous Delivery)?

Continuous delivery (CD) elaborates on CI by expediting code modifications to various environments like QA and staging after the build is completed. Moreover it also used to test new changes for stability, performance, and security.

Posted Date:- 2021-09-25 05:22:37

What is the use of Continuous Integration?

Continuous Integration allows various developers to combine the code modification into the central repository. Every combination triggers an automated build that compiles and executes the unit tests.

Posted Date:- 2021-09-25 05:22:09

List the different types of cluster modes in the azure databricks?

There are three different types of cluster modes in the azure databricks they are:

1. Single-node Cluster.
2. Standard Cluster.
3. High Concurrency Cluster.

Posted Date:- 2021-09-25 05:21:40

What can we do using API or command-line interface?

By using databricks API or command-line interface, we can:

1. Schedule the jobs.
2. Create/Delete or View jobs.
3. We can immediately run the jobs.
4. We can make it dynamic by passing the parameters at runtime.

Posted Date:- 2021-09-25 05:21:01

What is a databricks cluster?

A databricks cluster is a group of configurations and computation resources on which we can run data science, data analytics workloads, data engineering, like production ETL ad-hoc analytics, pipelines, machine learning, and streaming analytics.

Posted Date:- 2021-09-25 05:20:22

Write the syntax to connect the Azure storage account and databricks?

dbutils.fs.mount( source = â€œwasbs://@.blob.core.windows.netâ€, mount_point = â€œ/mnt/â€, extra_configs = {â€œâ€:dbutils.secrets.get(scope = â€œâ€, key = â€œâ€)})

Posted Date:- 2021-09-25 05:19:56

What is GraphX?

Spark uses GraphX for graph processing to build and transform interactive graphs. The GraphX component enables programmers to reason about structured data at scale.

Posted Date:- 2021-09-25 05:18:29

How to reuse the code in the azure notebook?

If we want to reuse the code in the azure notebook, then we must import that code into our notebook. We can import it in two waysâ€“> 1) if the code is in a different workspace, we have to create a module/jar of the code and then import it into a module or jar. 2) if the code is in the same workspace, we can directly import and reuse it.

Posted Date:- 2021-09-25 05:17:58

How to reuse the code in the azure notebook?

If we want to reuse the code in the azure notebook, then we must import that code into our notebook. We can import it in two waysâ€“> 1) if the code is in a different workspace, we have to create a module/jar of the code and then import it into a module or jar. 2) if the code is in the same workspace, we can directly import and reuse it.

Posted Date:- 2021-09-25 05:10:55

What is a Recovery Services Vault?

Recovery Services Vault is where the azure backups are stored. We can easily configure the data using RSV(Recovery Services Vault).

Posted Date:- 2021-09-25 05:09:52

Why should one maintain backup Azure blob storage?

Even though blob storage supports data replication, it may not handle the application errors that can crash the entire data. For this reason, we need to maintain backup Azure blob storage.

Posted Date:- 2021-09-25 05:09:15

What does Azure data lake do?

Azure data lake works amidst IT investments for managing, securing, and identifying data governance and management. It also allows us to extend the data applications by combining data warehouses and operational stores.

Posted Date:- 2021-09-25 05:07:39

What is an azure data lake?

An Azure data lake is a public cloud that enables all Microsoft users, scientists, business professionals, and developers to gain perspicacity from vast and complicated data sets.

Posted Date:- 2021-09-25 05:06:56

Compare Azure Databricks and AWS Databricks

Azure Databricks is the well-integrated product of Azure features and Databricks features.
Itâ€™s not a mere hosting of Databricks in the Azure platform. MS features like Active directory authentication and integration of many of Azure functionalities make Azure Databricks as a superior product. AWS Databricks is a mere hosting Databricks on AWS cloud.

Posted Date:- 2021-09-25 05:05:59

What is the category of Cloud service offered by Databricks? Is it SaaS or PaaS or IaaS?

The service offered by Databricks belongs to the Software as a service (SaaS) category and the purpose is to exploit the powers of Spark with clusters to manage storage. The users will have to change just the application configurations and start deploying them.

Posted Date:- 2021-09-25 05:05:30

What is the difference between Databricks and Azure Databricks?

Databricks unified Apache Sparkâ€™s processing power of data analysis and ML-driven data science/ Engineering techniques in managing the entire data lifecycle from ingestion state up to consumption state.

Azure Databricks combines some of Azureâ€™s capability along with the analytics features of Databricks to offer the best of both worlds to the end-user. It uses Azureâ€™s own data Extraction tool, Data Factory for culling out data from various sources and combines with AI-driven Databricks analytics capability in Transformation and Loading. It also uses MS active directory integration features to gain authentication and other Azureâ€™s and general features of MS to improve productivity.

Posted Date:- 2021-09-25 05:04:58

Is Microsoft the owner of Databricks?

No. Databricks is still an open-sourced product built on Apache Spark. Microsoft has made an investment of $250M in 2019. Microsoft integrated some of the services of Databricks into its cloud product Azure and released Azure Databricks in 2017. Similar tie-ups are in place with Amazon cloud AWS and Google cloud GCP.

Posted Date:- 2021-09-25 05:04:07

What are the main types of cloud services?

1. Infrastructure as a service (IaaS)

Itâ€™s the first logical step in the cloud journey. Computer hardware, network is hired from a cloud vendor and the entire application environment including the development/ hosting of applications have to be managed by the end consumers.

2. Software as a service (SaaS)

Infrastructure and application environment are provided by cloud vendors and the consumer will have to manage application settings and user authentication only.

3. Platform as a service (PaaS)

Infrastructure and Software development platforms are provided by cloud vendors and consumers will have to configure application settings, develop applications and host them in the cloud.

4. Serverless Computing

Itâ€™s an improvised version of PaaS. Server scalability as the application grows is handled by cloud vendors and users donâ€™t have to worry about it.

Posted Date:- 2021-09-25 05:03:47

Is there no on-premises option for Databricks and is it available only in cloud?

Yes. Apache Spark, the base version of Databricks was offered in an on-premises solution and in-house Engineers could maintain the application locally along with the data. Databricks is a cloud-native application and the users will face network issues in accessing the application with data in local servers. Data inconsistency and workflow inefficiencies are the other factors weighed against the on-premises options for Databricks.

Posted Date:- 2021-09-25 05:03:04

What is the difference between data warehouses and Data lakes?

Data Warehouse mostly contains processed structured data required for business analysis and managed in-house with local skills. Its structure cannot be changed so easily.

Data lakes contain all data including raw and old data, all types of data including unstructured, it can be scaled up easily and the data model can be changed quickly. It is maintained by third-party tools preferably in the cloud and it uses parallel processing in crunching the data.

Posted Date:- 2021-09-25 05:02:39

What does a Spark Engine do?

A Spark engine is responsible for scheduling, distributing, and monitoring the data application across the cluster.

Posted Date:- 2021-09-25 05:02:05

What is the purpose of databricks runtime?

Databricks runtime is used to run the set of components on the databricks platform.

Posted Date:- 2021-09-25 05:00:33

How to generate a personal access token in databricks?

We can generate a personal access token in seven steps they are:

1. In the upper right corner of Databricks workspace, click the icon named: â€œuser profile.â€
2. In the second step, you have to choose â€œUser setting.â€
3. navigate to the tab called â€œAccess Tokens.â€
4. Then you can find a â€œGenerate New Tokenâ€ button. Click it.

Posted Date:- 2021-09-25 04:59:29

What are the different ETL operations done on data in Azure Databricks?

The different ETL operations performed on data in Azure Databricks are:

1. The data is transformed from the databricks to the data warehouse.
2. Bold storage is used to load the data.
3. Bold storage acts as temporary storage of the data.

Posted Date:- 2021-09-25 04:55:50

What is the use of the databricks file system?

Databricks file system is a distributed file system used to ensure data reliability even after eliminating the cluster in Azure databricks.

Posted Date:- 2021-09-25 04:54:46

What is the use of Kafka?

Whenever Azure Databricks want to collect or stream the data, it connects to Event hubs and sources like Kafka.

Posted Date:- 2021-09-25 04:54:34

What are the different types of pricing tiers available in Databricks?

There are two types of pricing tiers available in Databricks they are:

1. Premium Tier
2. Standard Tier

Posted Date:- 2021-09-25 04:54:19

Which SQL version is used by databricks?

Spark implements ANSI 2003

Syntax: https://spark.apache.org/releases/spark-release-2-0-0.html

Posted Date:- 2021-09-25 04:53:56

Do we need to store the results of one action in other variables?

No, there is no need to store the results of one action in other variables.

Posted Date:- 2021-09-25 04:53:37

List different types of caching?

There are four types of caching they are:

1. Data caching
2. Web caching
3. Distributed caching
4. Output or Application caching

Posted Date:- 2021-09-25 04:53:24

Is it ok to clear the cache?

Yes, as cache stores all the irrelevant information(that is, kind of files that are not helpful to the operation of any application), so there is no problem in deleting or clearing the cache.

Posted Date:- 2021-09-25 04:52:53

What is caching?

A cache is a temporary storage. The process of storing the data in this temporary storage is called caching.

Whenever you return to a recently used page, the browser will retrieve the data from the cache instead of recovering it from the server, which saves time and reduces the burden on the server.

Posted Date:- 2021-09-25 04:52:36

Databricks interview questions for freshers /Databricks Interview Questions and Answers for Freshers & Experienced

Search

R4R Team