21. Difference between PCA and LDA
• PCA (Principal Component Analysis):
Unsupervised technique that reduces dimensionality by maximizing variance. It doesn’t consider class labels.
• LDA (Linear Discriminant Analysis):
Supervised technique that reduces dimensionality by maximizing class separability using labeled data.
22. What is Logistic Regression?
A classification algorithm used to predict the probability of a binary outcome (0 or 1).
It uses the sigmoid function to map outputs between 0–1. Commonly used in spam detection, churn prediction, etc.
23. What is Linear Regression?
A supervised learning method that models the relationship between a dependent variable and one or more independent variables using a straight line (Y = a + bX + e). It’s widely used for forecasting and trend analysis.
24. What are assumptions of Linear Regression?
• Linearity between independent and dependent variables
• No multicollinearity among predictors
• Homoscedasticity (equal variance of residuals)
• Residuals are normally distributed
• No autocorrelation in residuals
25. What is R-squared and Adjusted R-squared?
• R-squared: Proportion of variance in the dependent variable explained by the model
• Adjusted R-squared: Adjusts R-squared for the number of predictors, preventing overfitting in models with many variables
26. What are Residuals?
The difference between the observed value and the predicted value.
Residual = Actual − Predicted. They indicate model accuracy and should ideally be randomly distributed.
27. What is Regularization (L1 vs L2)?
Regularization prevents overfitting by penalizing large coefficients:
• L1 (Lasso): Adds absolute values of coefficients; can eliminate irrelevant features
• L2 (Ridge): Adds squared values of coefficients; shrinks them but rarely to zero
28. What is k-Nearest Neighbors (KNN)?
A lazy, non-parametric algorithm used for classification and regression. It assigns a label based on the majority of the k closest data points using a distance metric like Euclidean.
29. What is k-Means Clustering?
An unsupervised algorithm that groups data into k clusters. It assigns points to the nearest centroid and recalculates centroids iteratively until convergence.
30. Difference between Classification and Regression?
• Classification: Predicts discrete categories (e.g., Yes/No, Cat/Dog)
• Regression: Predicts continuous values (e.g., temperature, price)