Databricks Certified Professional Data Scientist - Databricks-Certified-Professional-Data-Scientist Exam Practice Test

You are creating a regression model with the input income, education and current debt of a customer, what could be the possible output from this model.
Correct Answer: B
Explanation: Only visible for ExamsLabs members. You can sign-up / login (it's free).
What are the advantages of the mutual information over the Pearson correlation for text classification problems?
Correct Answer: D
Explanation: Only visible for ExamsLabs members. You can sign-up / login (it's free).
Feature Hashing approach is "SGD-based classifiers avoid the need to predetermine vector size by simply picking a reasonable size and shoehorning the training data into vectors of that size" now with large vectors or with multiple locations per feature in Feature hashing?
Correct Answer: B
Explanation: Only visible for ExamsLabs members. You can sign-up / login (it's free).
You are working on a email spam filtering assignment, while working on this you find there is new word e.g.
HadoopExam comes in email, and in your solutions you never come across this word before, hence probability of this words is coming in either email could be zero. So which of the following algorithm can help you to avoid zero probability?
Correct Answer: B
Explanation: Only visible for ExamsLabs members. You can sign-up / login (it's free).
You are using k-means clustering to classify heart patients for a hospital. You have chosen Patient Sex, Height, Weight, Age and Income as measures and have used 3 clusters. When you create a pair-wise plot of the clusters, you notice that there is significant overlap between the clusters. What should you do?
Correct Answer: C
Refer to exhibit

You are asked to write a report on how specific variables impact your client's sales using a data set provided to you by the client. The data includes 15 variables that the client views as directly related to sales, and you are restricted to these variables only. After a preliminary analysis of the data, the following findings were made: 1.
Multicollinearity is not an issue among the variables 2. Only three variables-A, B, and C-have significant correlation with sales You build a linear regression model on the dependent variable of sales with the independent variables of A, B, and C.
The results of the regression are seen in the exhibit. You cannot request additional data. what is a way that you could try to increase the R2 of the model without artificially inflating it?
Correct Answer: D
Explanation: Only visible for ExamsLabs members. You can sign-up / login (it's free).
Select the correct problems which can be solved using SVMs
Correct Answer: A,B,C,D
Explanation: Only visible for ExamsLabs members. You can sign-up / login (it's free).