Machine Learning based Multiple choice questions
Upasana | September 10, 2019 | 4 min read | 117,792 views
-
Which of the following is a widely used and effective machine learning algorithm based on the idea of bagging?
-
Decision Tree
-
Regression
-
Classification
-
Random Forest - answer
-
-
To find the minimum or the maximum of a function, we set the gradient to zero because:
-
The value of the gradient at extrema of a function is always zero - answer
-
Depends on the type of problem
-
Both A and B
-
None of the above
-
-
The most widely used metrics and tools to assess a classification model are:
-
Confusion matrix
-
Cost-sensitive accuracy
-
Area under the ROC curve
-
All of the above - answer
-
-
Which of the following is a good test dataset characteristic?
-
Large enough to yield meaningful results
-
Is representative of the dataset as a whole
-
Both A and B - answer
-
None of the above
-
-
Which of the following is a disadvantage of decision trees?
-
Factor analysis
-
Decision trees are robust to outliers
-
Decision trees are prone to be overfit - answer
-
None of the above
-
-
How do you handle missing or corrupted data in a dataset?
-
Drop missing rows or columns
-
Replace missing values with mean/median/mode
-
Assign a unique category to missing values
-
All of the above - answer
-
-
What is the purpose of performing cross-validation?
-
To assess the predictive performance of the models
-
To judge how the trained model performs outside the sample on test data
-
Both A and B - answer
-
-
Why is second order differencing in time series needed?
-
To remove stationarity
-
To find the maxima or minima at the local point
-
Both A and B - answer
-
None of the above
-
-
When performing regression or classification, which of the following is the correct way to preprocess the data?
-
Normalize the data → PCA → training - answer
-
PCA → normalize PCA output → training
-
Normalize the data → PCA → normalize PCA output → training
-
None of the above
-
-
Which of the folllowing is an example of feature extraction?
-
Constructing bag of words vector from an email
-
Applying PCA projects to a large high-dimensional data
-
Removing stopwords in a sentence
-
All of the above - answer
-
-
What is pca.components_ in Sklearn?
-
Set of all eigen vectors for the projection space - answer
-
Matrix of principal components
-
Result of the multiplication matrix
-
None of the above options
-
-
Which of the following is true about Naive Bayes ?
-
Assumes that all the features in a dataset are equally important
-
Assumes that all the features in a dataset are independent
-
Both A and B - answer
-
None of the above options
-
-
Which of the following statements about regularization is not correct?
-
Using too large a value of lambda can cause your hypothesis to underfit the data.
-
Using too large a value of lambda can cause your hypothesis to overfit the data.
-
Using a very large value of lambda cannot hurt the performance of your hypothesis.
-
None of the above - answer
-
-
How can you prevent a clustering algorithm from getting stuck in bad local optima?
-
Set the same seed value for each run
-
Use multiple random initializations - answer
-
Both A and B
-
None of the above
-
-
Which of the following techniques can be used for normalization in text mining?
-
Stemming
-
Lemmatization
-
Stop Word Removal
-
Both A and B - answer
-
-
In which of the following cases will K-means clustering fail to give good results? 1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes
-
1 and 2
-
2 and 3
-
1, 2, and 3 - answer
-
1 and 3
-
-
Which of the following is a reasonable way to select the number of principal components "k"?
-
Choose k to be the smallest value so that at least 99% of the varinace is retained. - answer
-
Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer).
-
Choose k to be the largest value so that 99% of the variance is retained.
-
Use the elbow method
-
-
You run gradient descent for 15 iterations with a=0.3 and compute J(theta) after each iteration. You find that the value of J(Theta) decreases quickly and then levels off. Based on this, which of the following conclusions seems most plausible?
-
Rather than using the current value of a, use a larger value of a (say a=1.0)
-
Rather than using the current value of a, use a smaller value of a (say a=0.1)
-
a=0.3 is an effective choice of learning rate - answer
-
None of the above
-
-
What is a sentence parser typically used for?
-
It is used to parse sentences to check if they are utf-8 compliant.
-
It is used to parse sentences to derive their most likely syntax tree structures. - answer
-
It is used to parse sentences to assign POS tags to all tokens.
-
It is used to check if sentences can be parsed into meaningful tokens.
-
-
Suppose you have trained a logistic regression classifier and it outputs a new example x with a prediction ho(x) = 0.2. This means
-
Our estimate for P(y=1 | x)
-
Our estimate for P(y=0 | x) - answer
-
Our estimate for P(y=1 | x)
-
Our estimate for P(y=0 | x)
-
Top articles in this category:
- Top 100 interview questions on Data Science & Machine Learning
- Configure Logging in gunicorn based application in docker container
- Flask Interview Questions
- Google Data Scientist interview questions with answers
- Introduction to regression, correlation, multi collinearity and 99th percentile
- Deploying Keras Model in Production using Flask
- Machine Learning: Understanding Logistic Regression