Learning Paradigm | Labeled data with input-output pairs | Unlabeled data with no explicit target labels | Feedback-based learning through interactions |
Input Data | Input features (X) and corresponding target labels (Y) | Input features (X) without corresponding target labels | Input features (X) and environment feedback (rewards and penalties) |
Goal | Make predictions or decisions on new data | Discover patterns and relationships in data | Learn a policy to make optimal decisions |
Example Applications | Image classification, sentiment analysis, regression tasks, etc. | Clustering, anomaly detection, dimensionality reduction, recommendation systems, | Game playing (e.g., AlphaGo), robotic control, self-driving cars, etc. |
Training Approach | Supervised learning algorithms optimize a mapping between X and Y using labeled data | Unsupervised learning algorithms seek to find patterns or structure in the data without labels | Model learns through trial and error with exploration and |
Knowledge Required | Requires labeled data for training | Does not require labelled data | Requires understanding of the environment and its feedback |
Evaluation | Performance measured based on prediction accuracy or other classification metrics | Evaluation is more challenging and may be based on metrics like clustering quality | Evaluation is based on long-term cumulative rewards and penalties |
Exploration vs Exploitation | Not applicable | Not applicable | Balancing exploration and exploitation |
Common Algorithms | Linear regression, logistic regression, support vector machines, decision trees, etc. | K-Means clustering, Gaussian Mixture Models, autoencoders, etc. | Q-learning, Deep Q Network (DQN), Policy Gradient methods,Actor-Critic, etc. |