- Algorithm: A set of rules or steps used to solve a problem or perform a task, particularly in the context of data processing and analysis.
- Model: A mathematical representation of a real-world process, created by training an algorithm on data.
- Training Data: The dataset used to train a machine learning model, consisting of input-output pairs.
- Test Data: A separate dataset used to evaluate the performance of a trained model, ensuring it generalizes well to unseen data.
- Overfitting: A modeling error that occurs when a model learns the training data too well, capturing noise along with the underlying pattern, leading to poor performance on new data.
- Underfitting: A situation where a model is too simple to capture the underlying trend in the data, resulting in poor performance on both training and test datasets.
- Feature: An individual measurable property or characteristic of the data used as input for a model.
- Label: The output or target variable that a model aims to predict based on the input features.
- Supervised Learning: A type of machine learning where the model is trained on labeled data, learning to map inputs to outputs.
- Unsupervised Learning: A type of machine learning where the model is trained on unlabeled data, aiming to find patterns or groupings within the data.
- Reinforcement Learning: A type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize cumulative reward.
- Hyperparameters: Configuration settings used to control the training process of a model, which are set before training begins.
- Loss Function: A mathematical function that quantifies how well a model’s predictions match the actual outcomes; used to guide the optimization process.
- Gradient Descent: An optimization algorithm used to minimize the loss function by iteratively adjusting model parameters in the direction of the steepest descent.
- Cross-Validation: A technique for assessing how the results of a model will generalize by dividing the dataset into multiple subsets and training/testing across them.
- Confusion Matrix: A table used to evaluate the performance of a classification model by comparing predicted labels against actual labels.
- Precision and Recall: Metrics used to evaluate classification models; precision measures the accuracy of positive predictions, while recall measures the ability to find all relevant instances.
- ROC Curve (Receiver Operating Characteristic Curve): A graphical representation of a model’s diagnostic ability across various threshold settings, plotting true positive rates against false positive rates.
- Regularization: Techniques used to prevent overfitting by adding a penalty for complexity to the loss function (e.g., L1 and L2 regularization).
- Ensemble Learning: Combining multiple models to improve overall performance; common methods include bagging, boosting, and stacking.
Post Views: 17