Top Data Science Interview Questions with Answers: Part-4

31. What is Decision Tree vs Random Forest?

•⁠ ⁠Decision Tree is a single tree structure that splits data into branches using feature values to make decisions. It’s simple but prone to overfitting.

•⁠ ⁠Random Forest is an ensemble of multiple decision trees trained on different subsets of data and features. It improves accuracy and reduces overfitting.

32. What is Cross-Validation?

Cross-validation is a technique to evaluate model performance by dividing data into training and validation sets multiple times.

•⁠ ⁠K-Fold CV is common: data is split into k parts, and the model is trained/validated k times.

•⁠ ⁠Helps ensure model generalizes well.

33. What is Bias-Variance Tradeoff?

•⁠ ⁠Bias is error due to overly simplistic models (underfitting).

•⁠ ⁠Variance is error from too complex models (overfitting).

•⁠ ⁠The tradeoff is balancing both to minimize total error.

34. What is Overfitting vs Underfitting?

•⁠ ⁠Overfitting: Model learns noise and performs well on training but poorly on test data.

•⁠ ⁠Underfitting: Model is too simple, misses patterns, and performs poorly on both.

•⁠ ⁠Prevent with regularization, pruning, more data, etc.

35. What is ROC Curve and AUC?

•⁠ ⁠ROC (Receiver Operating Characteristic) Curve plots TPR (recall) vs FPR.

•⁠ ⁠AUC (Area Under Curve) measures model’s ability to distinguish classes.

•⁠ ⁠AUC close to 1 = great classifier, 0.5 = random.

36. What are Precision, Recall, and F1-Score?

•⁠ ⁠Precision: TP / (TP + FP) – How many predicted positives are correct.

•⁠ ⁠Recall (Sensitivity): TP / (TP + FN) – How many actual positives are caught.

•⁠ ⁠F1-Score: Harmonic mean of precision & recall. Good for imbalanced data.

37. What is Confusion Matrix?

A 2×2 table (for binary classification) showing:

•⁠ ⁠TP (True Positive)

•⁠ ⁠TN (True Negative)

•⁠ ⁠FP (False Positive)

•⁠ ⁠FN (False Negative)

Used to compute accuracy, precision, recall, etc.

38. What is Ensemble Learning?

Combining multiple models to improve accuracy. Types:

•⁠ ⁠Bagging: Reduces variance (e.g., Random Forest)

•⁠ ⁠Boosting: Reduces bias by correcting errors of previous models (e.g., XGBoost)

39. Explain Bagging vs Boosting

•⁠ ⁠Bagging (Bootstrap Aggregating): Trains models in parallel on random data subsets. Reduces overfitting.

•⁠ ⁠Boosting: Trains sequentially, each new model focuses on correcting previous mistakes. Boosts weak learners into strong ones.

40. What is XGBoost or LightGBM?

•⁠ ⁠XGBoost: Efficient gradient boosting algorithm; supports regularization, handles missing data.

•⁠ ⁠LightGBM: Faster alternative, uses histogram-based techniques and leaf-wise tree growth. Great for large datasets.

Post Views: 103