Top Data Science Interview Questions with Answers: Part-4

  • admin
  • January 2, 2026

31. What is Decision Tree vs Random Forest?  

•⁠  ⁠Decision Tree is a single tree structure that splits data into branches using feature values to make decisions. It’s simple but prone to overfitting.  

•⁠  ⁠Random Forest is an ensemble of multiple decision trees trained on different subsets of data and features. It improves accuracy and reduces overfitting.

32. What is Cross-Validation?  

Cross-validation is a technique to evaluate model performance by dividing data into training and validation sets multiple times.  

•⁠  ⁠K-Fold CV is common: data is split into k parts, and the model is trained/validated k times.  

•⁠  ⁠Helps ensure model generalizes well.

33. What is Bias-Variance Tradeoff?  

•⁠  ⁠Bias is error due to overly simplistic models (underfitting).  

•⁠  ⁠Variance is error from too complex models (overfitting).  

•⁠  ⁠The tradeoff is balancing both to minimize total error.

34. What is Overfitting vs Underfitting?  

•⁠  ⁠Overfitting: Model learns noise and performs well on training but poorly on test data.  

•⁠  ⁠Underfitting: Model is too simple, misses patterns, and performs poorly on both.  

•⁠  ⁠Prevent with regularization, pruning, more data, etc.

35. What is ROC Curve and AUC?

•⁠  ⁠ROC (Receiver Operating Characteristic) Curve plots TPR (recall) vs FPR.  

•⁠  ⁠AUC (Area Under Curve) measures model’s ability to distinguish classes.  

•⁠  ⁠AUC close to 1 = great classifier, 0.5 = random.

36. What are Precision, Recall, and F1-Score?  

•⁠  ⁠Precision: TP / (TP + FP) – How many predicted positives are correct.  

•⁠  ⁠Recall (Sensitivity): TP / (TP + FN) – How many actual positives are caught.  

•⁠  ⁠F1-Score: Harmonic mean of precision & recall. Good for imbalanced data.

37. What is Confusion Matrix?  

A 2×2 table (for binary classification) showing:  

•⁠  ⁠TP (True Positive)  

•⁠  ⁠TN (True Negative)  

•⁠  ⁠FP (False Positive)  

•⁠  ⁠FN (False Negative)  

Used to compute accuracy, precision, recall, etc.

38. What is Ensemble Learning?  

Combining multiple models to improve accuracy. Types:  

•⁠  ⁠Bagging: Reduces variance (e.g., Random Forest)  

•⁠  ⁠Boosting: Reduces bias by correcting errors of previous models (e.g., XGBoost)

39. Explain Bagging vs Boosting  

•⁠  ⁠Bagging (Bootstrap Aggregating): Trains models in parallel on random data subsets. Reduces overfitting.  

•⁠  ⁠Boosting: Trains sequentially, each new model focuses on correcting previous mistakes. Boosts weak learners into strong ones.

40. What is XGBoost or LightGBM?

•⁠  ⁠XGBoost: Efficient gradient boosting algorithm; supports regularization, handles missing data.  

•⁠  ⁠LightGBM: Faster alternative, uses histogram-based techniques and leaf-wise tree growth. Great for large datasets.

Leave a Reply

Your email address will not be published. Required fields are marked *

Need Help?