Most Common Data Science Interview Q&A

  • admin
  • October 21, 2025

Prepare smart with these frequently asked questions covering ML, stats, Python, and more!

1. Q: Supervised vs Unsupervised Learning?
Supervised: Uses labeled data (e.g., regression, classification)
Unsupervised: No labels (e.g., clustering, PCA)

2. Q: What is overfitting? How to prevent it?
Overfitting: Model performs well on training but poorly on new data.
Use cross-validation, regularization (L1/L2), pruning, or get more data.

3. Q: Bias vs Variance?
Bias: Error from incorrect assumptions
Variance: Error from sensitivity to small fluctuations
Trade-off between both is crucial.

4. Q: What is the difference between classification and regression?
Classification: Predict categories (spam or not)
Regression: Predict continuous values (price, temperature)

5. Q: What is precision, recall, and F1 score?
Precision: TP / (TP + FP)
Recall: TP / (TP + FN)
F1: Harmonic mean of precision & recall

6. Q: What’s the purpose of ROC-AUC?
Evaluates classification model’s ability to distinguish classes.

7. Q: What is feature engineering?
Creating new input features or transforming data to improve model performance.

8. Q: How is NumPy different from lists?
NumPy arrays are faster, more efficient, and support vectorized operations.

9. Q: Difference between apply() and map() in Pandas?
map() works on Series, apply() works on Series or DataFrames.

10. Q: How to handle missing data?
Drop rows/columns, fill with mean/median/mode, or use model-based imputation.

11. Q: How to get top 3 salaries from an Employee table?
sql
SELECT DISTINCT salary
FROM employee
ORDER BY salary DESC
LIMIT 3;

12. Q: What is a JOIN?
Combines rows from two or more tables using a related column.

13. Q: How to deploy a data science model?
Save model (using pickle/joblib), wrap with Flask/FastAPI, host on Render/Heroku/AWS.

14. Q: How to explain a model to non-tech stakeholders?
Use visuals, simple analogies, focus on impact, not technical metrics.

Bonus: Key Libraries to Know
NumPy, Pandas, Matplotlib, Scikit-learn, Seaborn, TensorFlow/PyTorch (for DL)

Leave a Reply

Your email address will not be published. Required fields are marked *

Need Help?