Microsoft Design and Implement Big Data Analytics Solutions - 070-475 Exam Practice Test
You have a Microsoft Azure Data Factory pipeline.
You discover that the pipeline fails to execute because data is missing.
You need to rerun the failure in the pipeline.
Which cmdlet should you use?
You discover that the pipeline fails to execute because data is missing.
You need to rerun the failure in the pipeline.
Which cmdlet should you use?
Correct Answer: B
A company named Fabricam, Inc, has a web app hosted in Microsoft Azure. Millions of users visit the app daily.
All of the user visits are logged in Azure Blob storage. Data analysts at Fabrikam built a dashboard that processes the user visit logs.
Fabrikam plans to use an Apache Hadoop cluster on Azure HDInsight to process queries. The queries will access the data only once.
You need to recommend a query execution strategy.
What is the best to recommend using to achieve the goal?
More than one answer choice may achieve the goal. Select the
All of the user visits are logged in Azure Blob storage. Data analysts at Fabrikam built a dashboard that processes the user visit logs.
Fabrikam plans to use an Apache Hadoop cluster on Azure HDInsight to process queries. The queries will access the data only once.
You need to recommend a query execution strategy.
What is the best to recommend using to achieve the goal?
More than one answer choice may achieve the goal. Select the
Correct Answer: D
Explanation: Only visible for ExamsLabs members. You can sign-up / login (it's free).
You are designing an application that will perform real-time processing by using Microsoft Azure Stream Analytics.
You need to identify the valid outputs of a Stream Analytics job.
What are three possible outputs that you can use? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
You need to identify the valid outputs of a Stream Analytics job.
What are three possible outputs that you can use? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
Correct Answer: B,D,E
Explanation: Only visible for ExamsLabs members. You can sign-up / login (it's free).
You have a pipeline that contains an input dataset in Microsoft Azure Table Storage and an output dataset in Azure Blob storage. You have the following JSON data.

Use the drop-down menus to select the answer choice that completes each statement based on the information presented in the JSON data.
NOTE: Each correct selection is worth one point.


Use the drop-down menus to select the answer choice that completes each statement based on the information presented in the JSON data.
NOTE: Each correct selection is worth one point.

Correct Answer:

Explanation

Box 1: Every three days at 10.00
anchorDateTime defines the absolute position in time used by the scheduler to compute dataset slice boundaries.
"frequency": "<Specifies the time unit for data slice production. Supported frequency: Minute, Hour, Day, Week, Month>",
"interval": "<Specifies the interval within the defined frequency. For example, frequency set to 'Hour' and interval set to 1 indicates that new data slices should be produced hourly> Box 2: Every minute up to three times.
retryInterval is the wait time between a failure and the next attempt. This setting applies to present time. If the previous try failed, the next try is after the retryInterval period.
Example: 00:01:00 (1 minute)
Example: If it is 1:00 PM right now, we begin the first try. If the duration to complete the first validation check is 1 minute and the operation failed, the next retry is at 1:00 + 1min (duration) + 1min (retry interval) =
1:02 PM.
For slices in the past, there is no delay. The retry happens immediately.
retryTimeout is the timeout for each retry attempt.
maximumRetry is the number of times to check for the availability of the external data.
You are planning a solution that will have multiple data files stored in Microsoft Azure Blob storage every hour. Data processing will occur once a day at midnight only.
You create an Azure data factory that has blob storage as the input source and an Azure HD Insight activity that uses the input to create an output Hive table.
You need to identify a data slicing strategy for the data factory.
What should you identify? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.

You create an Azure data factory that has blob storage as the input source and an Azure HD Insight activity that uses the input to create an output Hive table.
You need to identify a data slicing strategy for the data factory.
What should you identify? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.

Correct Answer:

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the states goals. Some question sets might have more than one correct solution, while the others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have an Apache Spark system that contains 5 TB of data.
You need to write queries that analyze the data in the system. The queries must meet the following requirements:
* Use static data typing.
* Execute queries as quickly as possible.
* Have access to the latest language features.
Solution: You write the queries by using Python.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have an Apache Spark system that contains 5 TB of data.
You need to write queries that analyze the data in the system. The queries must meet the following requirements:
* Use static data typing.
* Execute queries as quickly as possible.
* Have access to the latest language features.
Solution: You write the queries by using Python.
Correct Answer: B
You need to design the data load process from DB1 to DB2. Which data import technique should you use in the design?
Correct Answer: B
You have a Microsoft Azure data factory.
You assign administrative roles to the users in the following table.

You discover that several new data factory instances were created.
You need to ensure that only User5 can create a new data factory instance.
Which two roles should you change? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
You assign administrative roles to the users in the following table.

You discover that several new data factory instances were created.
You need to ensure that only User5 can create a new data factory instance.
Which two roles should you change? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
Correct Answer: D,E