Amazon AWS Certified Data Engineer - Associate (DEA-C01) - Data-Engineer-Associate Exam Practice Test

A data engineer must orchestrate a data pipeline that consists of one AWS Lambda function and one AWS Glue job. The solution must integrate with AWS services.
Which solution will meet these requirements with the LEAST management overhead?
Correct Answer: D
Explanation: Only visible for ExamsLabs members. You can sign-up / login (it's free).
A data engineer configures a large number of AWS Glue jobs that all start up around the same time. All the jobs run for less than 1 hour in the same subnet of the same VPC. All the AWS Glue jobs run on a G.1X worker type.
Some of the jobs occasionally fail with the following error: "The specified subnet does not have enough free addresses to satisfy the request." What is the likely root cause of the error?
Correct Answer: A
Explanation: Only visible for ExamsLabs members. You can sign-up / login (it's free).
A company stores customer data in an Amazon S3 bucket. The company must permanently delete all customer data that is older than 7 years.
Correct Answer: A
Explanation: Only visible for ExamsLabs members. You can sign-up / login (it's free).
A company needs to implement a data mesh architecture for trading, risk, and compliance teams. Each team has its own data but needs to share views. They have 1,000+ tables in 50 Glue databases. All teams use Athena and Redshift, and compliance requires full auditing and PII access control.
Correct Answer: A
Explanation: Only visible for ExamsLabs members. You can sign-up / login (it's free).
A data engineer needs to analyze time-sensitive sales data. The company stores the data in an Amazon S3 bucket. The data engineer uses AWS Glue Data Catalog to access the data.
When performing the analysis, the data engineer notices that some records are missing or out of date.
What is the likely cause of these issues?
Correct Answer: C
Explanation: Only visible for ExamsLabs members. You can sign-up / login (it's free).
A company is creating a new data pipeline to populate a data lake. A data analyst needs to prepare and standardize the data before a data engineering team can perform advanced data transformations. The data analyst needs a solution to process the data that does not require writing new code.
Which solution will meet these requirements with the LEAST operational effort?
Correct Answer: A
Explanation: Only visible for ExamsLabs members. You can sign-up / login (it's free).
A company's data processing pipeline uses AWS Glue jobs and AWS Glue Data Catalog. All AWS Glue jobs must run in a custom VPC inside a private subnet. The company uses a NAT gateway to support outbound connections.
A data engineer needs to use AWS Glue to migrate data from an on-premises PostgreSQL database to Amazon S3. There is no current network connection between AWS and the on-premises environment.
However, the data engineer has updated the on-premises database to allow traffic from the custom VPC.
Which solution will meet these requirements?
Correct Answer: D
Explanation: Only visible for ExamsLabs members. You can sign-up / login (it's free).
A company needs to build an extract, transform, and load (ETL) pipeline that has separate stages for batch data ingestion, transformation, and storage. The pipeline must store the transformed data in an Amazon S3 bucket. Each stage must automatically retry failures. The pipeline must provide visibility into the success or failure of individual stages.
Which solution will meet these requirements with the LEAST operational overhead?
Correct Answer: D
Explanation: Only visible for ExamsLabs members. You can sign-up / login (it's free).
A company needs to set up a data catalog and metadata management for data sources that run in the AWS Cloud. The company will use the data catalog to maintain the metadata of all the objects that are in a set of data stores. The data stores include structured sources such as Amazon RDS and Amazon Redshift. The data stores also include semistructured sources such as JSON files and .xml files that are stored in Amazon S3.
The company needs a solution that will update the data catalog on a regular basis. The solution also must detect changes to the source metadata.
Which solution will meet these requirements with the LEAST operational overhead?
Correct Answer: C
Explanation: Only visible for ExamsLabs members. You can sign-up / login (it's free).
A company maintains an Amazon Redshift provisioned cluster that the company uses for extract, transform, and load (ETL) operations to support critical analysis tasks. A sales team within the company maintains a Redshift cluster that the sales team uses for business intelligence (BI) tasks.
The sales team recently requested access to the data that is in the ETL Redshift cluster so the team can perform weekly summary analysis tasks. The sales team needs to join data from the ETL cluster with data that is in the sales team ' s BI cluster.
The company needs a solution that will share the ETL cluster data with the sales team without interrupting the critical analysis tasks. The solution must minimize usage of the computing resources of the ETL cluster.
Which solution will meet these requirements?
Correct Answer: D
Explanation: Only visible for ExamsLabs members. You can sign-up / login (it's free).
A data engineer is building a data pipeline. A large data file is uploaded to an Amazon S3 bucket once each day at unpredictable times. An AWS Glue workflow uses hundreds of workers to process the file and load the data into Amazon Redshift. The company wants to process the file as quickly as possible.
Which solution will meet these requirements?
Correct Answer: B
Explanation: Only visible for ExamsLabs members. You can sign-up / login (it's free).
A company runs an extract, transform, and load (ETL) job in AWS Glue. The job processes personally identifiable information (PII) data and writes logs to an Amazon CloudWatch Logs log group. A data engineer needs to mask PII data in the CloudWatch Logs log group.
Which solution will meet these requirements?
Correct Answer: C
Explanation: Only visible for ExamsLabs members. You can sign-up / login (it's free).
A company uses AWS Step Functions to orchestrate a data pipeline. The pipeline consists of Amazon EMR jobs that ingest data from data sources and store the data in an Amazon S3 bucket. The pipeline also includes EMR jobs that load the data to Amazon Redshift.
The company ' s cloud infrastructure team manually built a Step Functions state machine. The cloud infrastructure team launched an EMR cluster into a VPC to support the EMR jobs. However, the deployed Step Functions state machine is not able to run the EMR jobs.
Which combination of steps should the company take to identify the reason the Step Functions state machine is not able to run the EMR jobs? (Choose two.)
Correct Answer: A,E
Explanation: Only visible for ExamsLabs members. You can sign-up / login (it's free).