31. What is Decision Tree vs Random Forest?
• Decision Tree is a single tree structure that splits data into branches using feature values to make decisions. It’s simple but prone to overfitting.
• Random Forest is an ensemble of multiple decision trees trained on different subsets of data and features. It improves accuracy and reduces overfitting.
32. What is Cross-Validation?
Cross-validation is a technique to evaluate model performance by dividing data into training and validation sets multiple times.
• K-Fold CV is common: data is split into k parts, and the model is trained/validated k times.
• Helps ensure model generalizes well.
33. What is Bias-Variance Tradeoff?
• Bias is error due to overly simplistic models (underfitting).
• Variance is error from too complex models (overfitting).
• The tradeoff is balancing both to minimize total error.
34. What is Overfitting vs Underfitting?
• Overfitting: Model learns noise and performs well on training but poorly on test data.
• Underfitting: Model is too simple, misses patterns, and performs poorly on both.
• Prevent with regularization, pruning, more data, etc.
35. What is ROC Curve and AUC?
• ROC (Receiver Operating Characteristic) Curve plots TPR (recall) vs FPR.
• AUC (Area Under Curve) measures model’s ability to distinguish classes.
• AUC close to 1 = great classifier, 0.5 = random.
36. What are Precision, Recall, and F1-Score?
• Precision: TP / (TP + FP) – How many predicted positives are correct.
• Recall (Sensitivity): TP / (TP + FN) – How many actual positives are caught.
• F1-Score: Harmonic mean of precision & recall. Good for imbalanced data.
37. What is Confusion Matrix?
A 2×2 table (for binary classification) showing:
• TP (True Positive)
• TN (True Negative)
• FP (False Positive)
• FN (False Negative)
Used to compute accuracy, precision, recall, etc.
38. What is Ensemble Learning?
Combining multiple models to improve accuracy. Types:
• Bagging: Reduces variance (e.g., Random Forest)
• Boosting: Reduces bias by correcting errors of previous models (e.g., XGBoost)
39. Explain Bagging vs Boosting
• Bagging (Bootstrap Aggregating): Trains models in parallel on random data subsets. Reduces overfitting.
• Boosting: Trains sequentially, each new model focuses on correcting previous mistakes. Boosts weak learners into strong ones.
40. What is XGBoost or LightGBM?
• XGBoost: Efficient gradient boosting algorithm; supports regularization, handles missing data.
• LightGBM: Faster alternative, uses histogram-based techniques and leaf-wise tree growth. Great for large datasets.