Top Data Science Interview Questions with Answers: Part-3

  • admin
  • January 2, 2026

21. Difference between PCA and LDA  

•⁠  ⁠PCA (Principal Component Analysis):  

  Unsupervised technique that reduces dimensionality by maximizing variance. It doesn’t consider class labels.  

•⁠  ⁠LDA (Linear Discriminant Analysis):  

  Supervised technique that reduces dimensionality by maximizing class separability using labeled data.

22. What is Logistic Regression?  

A classification algorithm used to predict the probability of a binary outcome (0 or 1).  

It uses the sigmoid function to map outputs between 0–1. Commonly used in spam detection, churn prediction, etc.

23. What is Linear Regression?  

A supervised learning method that models the relationship between a dependent variable and one or more independent variables using a straight line (Y = a + bX + e). It’s widely used for forecasting and trend analysis.

24. What are assumptions of Linear Regression?  

•⁠  ⁠Linearity between independent and dependent variables  

•⁠  ⁠No multicollinearity among predictors  

•⁠  ⁠Homoscedasticity (equal variance of residuals)  

•⁠  ⁠Residuals are normally distributed  

•⁠  ⁠No autocorrelation in residuals

25. What is R-squared and Adjusted R-squared?

•⁠  ⁠R-squared: Proportion of variance in the dependent variable explained by the model  

•⁠  ⁠Adjusted R-squared: Adjusts R-squared for the number of predictors, preventing overfitting in models with many variables

26. What are Residuals?  

The difference between the observed value and the predicted value.  

Residual = Actual − Predicted. They indicate model accuracy and should ideally be randomly distributed.

27. What is Regularization (L1 vs L2)?  

Regularization prevents overfitting by penalizing large coefficients:  

•⁠  ⁠L1 (Lasso): Adds absolute values of coefficients; can eliminate irrelevant features  

•⁠  ⁠L2 (Ridge): Adds squared values of coefficients; shrinks them but rarely to zero

28. What is k-Nearest Neighbors (KNN)?  

A lazy, non-parametric algorithm used for classification and regression. It assigns a label based on the majority of the k closest data points using a distance metric like Euclidean.

29. What is k-Means Clustering?  

An unsupervised algorithm that groups data into k clusters. It assigns points to the nearest centroid and recalculates centroids iteratively until convergence.

30. Difference between Classification and Regression?  

•⁠  ⁠Classification: Predicts discrete categories (e.g., Yes/No, Cat/Dog)

•⁠  ⁠Regression: Predicts continuous values (e.g., temperature, price)

Leave a Reply

Your email address will not be published. Required fields are marked *

Need Help?