How AI Foundation Models Are Revolutionizing Complex Reasoning: From Game-Playing to Mathematical Discovery

Introduction: The Pursuit of Advanced Artificial Intelligence

Envision a computer capable of solving a geometry problem in a manner akin to a human—not merely by memorizing formulas, but by visualizing shapes, identifying patterns, and making innovative connections. Modern AI foundation models are beginning to achieve this, representing a transformative development in how machines tackle complex reasoning tasks.

The advancement of artificial intelligence has consistently been fueled by ambitious objectives: creating machines that can engage in strategic games, uncover mathematical theorems, and solve problems that demand genuine insight rather than mere computational power. Today's foundation models signify a considerable advancement in this endeavor, merging pattern recognition capabilities with adaptable problem-solving skills.

The Evolution of AI Reasoning: From Symbol Manipulation to Data-Driven Learning

The Limitations of Symbolic AI

Traditional AI systems predominantly relied on symbolic reasoning—programming computers with explicit rules and logical operations. This approach is akin to providing someone with a detailed cookbook for every conceivable cooking scenario. While effective for well-defined problems, it quickly became cumbersome when confronted with the complexities of real-world situations.

For instance, early chess programs utilized brute-force search methods combined with manually crafted evaluation functions. Programmers encoded knowledge about advantageous chess positions, resulting in systems that were inflexible and limited in adaptability.

The Shift to Data-Driven Learning

Modern foundation models adopt a fundamentally different strategy. Rather than depending on pre-defined rules, these systems learn patterns from extensive datasets. This is comparable to the distinction between memorizing a phrasebook and genuinely learning a language—the latter is significantly more versatile and powerful.

Take AlphaGo as an example; it transformed the ancient game of Go by analyzing millions of games and even competing against itself, uncovering innovative strategies that astonished even expert players. This data-centric approach unlocked possibilities that traditional rule-based systems could not achieve.

Addressing Complex Reasoning Challenges

The Geometry Problem: An Insight into AI Reasoning

Let us explore a specific example that highlights the intricacies of reasoning tasks. Proving that the base angles of an isosceles triangle are equal may seem straightforward for a human but presents considerable challenges for AI systems.

The Human Approach:

  • Visualize the triangle and its characteristics
  • Recognize the symmetry inherent in the isosceles shape
  • Make a creative leap by drawing an angle bisector from the apex
  • Utilize congruent triangles to finalize the proof

The AI Challenge: At each stage, an AI system encounters numerous possibilities. It could draw various auxiliary lines, introduce different geometric constructions, or apply multiple theorems, resulting in an almost infinite search space, making it impractical to explore every option.

Foundation models excel in this regard. Instead of aimlessly navigating through all possibilities, they can learn to identify promising patterns and make informed decisions about which paths to pursue.

The Universal Nature of Reasoning Problems

Interestingly, similar structures manifest across diverse domains. Consider the following parallels:

Chemical Synthesis vs. Mathematical Proofs:

  • Both involve constructing tree-like structures
  • In chemistry: combining molecules to form compounds
  • In mathematics: connecting logical steps to derive conclusions
  • Both necessitate creative insights regarding which paths to follow

Software Development vs. Theorem Proving:

  • Both begin with high-level objectives and decompose them into specific tasks
  • Both require comprehension of abstract concepts and their tangible applications
  • Both benefit from recognizing reusable patterns and methodologies

This universality implies that reasoning skills acquired in one domain can be applied to others, which is a significant advantage of foundation models.

The Three Pillars of Foundation Model Reasoning

1. Generativity: Crafting Solutions from the Ground Up

Traditional AI systems were confined to predicting from a limited set of options. In contrast, foundation models can generate entirely new solutions by modeling the probability distribution of successful approaches.

Analogy: This can be likened to the difference between answering multiple-choice questions and writing an essay. Multiple-choice questions restrict you to predefined options, while essay writing enables creative expression and original ideas. Foundation models function more like the essay approach, capable of generating unique constructions, innovative program code, or creative mathematical conjectures.

Real-World Applications:

  • Drug Discovery: Generating new molecular structures with desired characteristics
  • Software Engineering: Developing novel algorithms to address specific challenges
  • Mathematical Research: Proposing new theorems and proof strategies

2. Universality: Cross-Domain Learning

A key strength of foundation models is their ability to learn general reasoning patterns applicable across various fields. This is similar to recognizing effective argument structures—once understood in one context, it can be applied to law, science, philosophy, or any other discipline.

Transfer Learning in Practice:

  • Insights gained from analyzing code can assist in mathematical proofs
  • Geometric reasoning capabilities can aid in molecular modeling
  • Strategic thinking from game-playing can enhance optimization problem-solving

This cross-domain knowledge transfer significantly reduces the amount of training data needed for new tasks, resulting in more robust and adaptable solutions.

3. Grounding: Linking Symbols to Meaning

One of the most compelling capabilities of foundation models is their ability to connect abstract symbols to tangible understanding. When humans encounter the term "isosceles triangle," it conjures a visual image and geometric intuitions. Foundation models are beginning to form similar associations.

Multimodal Comprehension:

  • Connecting mathematical equations to their graphical representations
  • Linking programming code to its visual outcomes
  • Associating chemical formulas with molecular structures

This grounding enables AI systems to reason more like humans do—not merely manipulating symbols but understanding the real-world significance of those symbols.

Current Applications and Success Stories

Mathematical Theorem Proving

Contemporary AI systems have demonstrated the ability to discover and prove mathematical theorems, occasionally producing proofs that are more concise and elegant than those developed by humans. These systems can:

  • Generate conjectures by identifying patterns within mathematical data.
  • Construct formal proofs in a systematic manner.
  • Verify the accuracy of complex mathematical arguments.

Program Synthesis and Code Understanding

Foundation models have significantly impacted software development through:

  • Code Generation: Producing programs based on natural language descriptions.
  • Bug Detection: Identifying programming errors and recommending corrections.
  • Code Translation: Converting code between various programming languages.
  • Documentation: Automatically generating explanations for intricate code.

Scientific Discovery

In the fields of chemistry and materials science, AI systems are:

  • Predicting optimal synthetic routes for novel compounds.
  • Discovering new materials with specific desired properties.
  • Accelerating drug discovery by identifying promising molecular candidates.

Challenges and Future Directions

The Data Scarcity Problem

In contrast to images or text, high-quality reasoning data is limited and costly to produce. Mathematical proofs, verified code, and scientific discoveries necessitate expert knowledge and thorough validation, creating a bottleneck in training advanced reasoning systems.

Potential Solutions:

  • Synthetic Data Generation: Developing artificial yet realistic problems and solutions.
  • Self-Supervised Learning: Enabling models to learn from the inherent structure of problems.
  • Interactive Learning: Engaging human experts to facilitate the learning process.

The Need for High-Level Planning

Current foundation models are proficient at next-step predictions but face challenges with long-term strategic planning. Humans typically approach complex issues by:

  1. Establishing a high-level strategy.
  2. Decomposing it into manageable tasks.
  3. Implementing the detailed plan.

Equipping AI systems to operate at this architectural level remains a significant challenge.

Reliability and Verification

As AI systems enhance their autonomous reasoning abilities, ensuring their reliability becomes imperative. We require methods to:

  • Confirm the correctness of AI-generated proofs.
  • Validate that synthesized code performs as intended.
  • Assess scientific discoveries before their practical application.

The Road Ahead: Implications and Opportunities

Education and Learning

Foundation models with robust reasoning capabilities have the potential to transform education by:

  • Offering personalized tutoring that adjusts to individual learning preferences.
  • Creating interactive environments for problem-solving.
  • Generating practice problems customized to student requirements.
  • Explaining complex concepts through diverse approaches.

Scientific Acceleration

By automating routine reasoning tasks while enhancing human creativity, these systems could significantly accelerate scientific advancements:

  • Hypothesis Generation: Proposing innovative research avenues.
  • Experimental Design: Refining experimental protocols.
  • Literature Analysis: Detecting patterns across extensive research bodies.
  • Collaborative Discovery: Fostering partnerships between humans and AI in research.

Ethical Considerations

As reasoning capabilities evolve, we must thoughtfully address:

  • Transparency: Ensuring AI reasoning processes remain interpretable.
  • Reliability: Developing systems that gracefully handle failures and acknowledge uncertainty.
  • Human Agency: Preserving meaningful human involvement in critical decision-making.
  • Equity: Guaranteeing fair distribution of benefits across society.

Conclusion: The Promise of Intelligent Partnership

The advancement of foundation models with advanced reasoning capabilities signifies more than a technological milestone; it heralds a future where AI systems can act as true intellectual partners rather than merely tools.

These systems will not replace human reasoning but will enhance it, managing routine cognitive tasks and allowing humans to concentrate on creativity, judgment, and meaning-making. Future geometry students may collaborate with AI tutors capable of visualizing problems in various ways, suggesting alternative proof strategies, and providing personalized guidance.

As we continue to explore the frontiers of AI reasoning, we are not simply developing smarter machines; we are forging new avenues for collaboration between humans and artificial intelligence to address the intricate challenges that lie ahead.

The evolution from rule-based systems to today’s foundation models has been extraordinary, yet we remain in the early phases of this transformation. The coming years will likely introduce even more advanced reasoning capabilities, unlocking possibilities that are currently beyond our imagination.

It is clear that the future of problem-solving will rely on a partnership between human insight and artificial intelligence, combining the strengths of both to confront challenges that neither could overcome independently.

Citation: Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., ... & Liang, P. (2022). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258. Available at: https://arxiv.org/abs/2108.07258

The Future of Robotics: Building Intelligent Foundation Models for General-Purpose Robots

Introduction: The Vision of Universal Robots

Envision entering your kitchen and instructing, "Robot, prepare breakfast." The robot comprehends not only your command but also your preferences—whether you desire pancakes and coffee or toast and juice—and adapts to your kitchen layout to execute the task effectively. This scenario represents the ambitious aim behind advancing robotics foundation models.

Just as GPT revolutionized language processing and vision models transformed image recognition, robotics is poised for a similar foundation model revolution. Unlike text or images, robots must navigate a complex physical environment, making real-time decisions that affect their surroundings.

Understanding Robotics Foundation Models

Foundation models can be likened to the "Swiss Army knife" of artificial intelligence. In language processing, models such as GPT manage various tasks like writing, translation, and summarization from a single trained system. Robotics foundation models aspire to provide similar versatility for physical robots.

These models will be trained on extensive, diverse datasets, including:

  • Robot interaction data: Extensive hours of robots performing diverse tasks
  • Human demonstration videos: Learning from observing people in various activities
  • Simulation environments: Virtual settings for safe practice
  • Natural language descriptions: Interpreting task instructions in straightforward language
No description has been provided for this image

Read more…

Foundation Models in Computer Vision: Transforming How Machines See and Understand the World

Introduction: The Vision Revolution

The goal of enabling computers to perceive the world as humans do—recognizing objects, understanding scenes, and processing complex visual information in fractions of a second—has been a long-standing pursuit in computer vision research. Foundation models are significantly advancing this objective.

Foundation models in computer vision signify a shift from traditional methodologies. Rather than developing distinct models for individual tasks, these robust systems learn from extensive visual datasets and can be applied across various domains, including medical diagnostics and autonomous vehicles.

Definition of Vision Foundation Models

Vision foundation models are large-scale AI systems trained on extensive datasets comprising images and visual data. Unlike conventional computer vision models that rely on meticulously labeled data for each task, these models utilize self-supervised learning, enabling them to identify patterns in unannotated visual data.

Key Characteristics:

  • Scale: Trained on millions to billions of images
  • Versatility: Adaptable to various vision tasks
  • Self-supervised learning: Minimizes reliance on manual annotations
  • Multimodal integration: Merges visual data with text, audio, and other inputs
No description has been provided for this image

Read more…

How Foundation Model is changing the Natural Language Processing (NLP) landscape ?

Foundation Models in Natural Language Processing: A Comprehensive Overview

The Language Revolution in AI

Language is fundamental to human communication, influencing our thoughts, relationships, and knowledge acquisition. Every society develops complex spoken or signed languages, which children learn effortlessly. This complexity poses a significant challenge in artificial intelligence research.

Natural Language Processing (NLP) focuses on enabling computers to understand and generate human language. A transformative shift occurred in 2018 with the advent of foundation models, revolutionizing our approach to language technology.

No description has been provided for this image

Read more…

What is the future of Foundation Models ?

The Future of AI Foundation Models: Who Will Shape Tomorrow's Technology?

The Early Days of AI: Understanding Our Current Landscape

While recent advancements like ChatGPT have made headlines, we are still in the initial phase of the foundation model revolution. Picture it like the internet in 1995—we recognize the immense potential, yet we are still navigating the necessary rules, standards, and best practices.

At this moment, these advanced AI systems function as "research prototypes" available to the public. It's akin to taking experimental vehicles for a spin on public roads—thrilling but accompanied by uncertain risks and outcomes.

A Crucial Inquiry: Who Will Guide AI's Future?

The evolution of foundation models prompts a vital question that will influence technological advancements for the next decade: Who will steer the development of AI? The answer will impact various facets of society, from job markets to democratic processes.

The Divide: Industry vs. Academia

No description has been provided for this image

Read more…

How foundational Models differs from other models such as machine learning or deep learning models ?

Exploring Foundation Models: A Guide to the AI Revolution

The Two Forces Shaping AI: Emergence and Homogenization

Artificial Intelligence has experienced a significant transformation over the last thirty years, driven by two key forces that are redefining the development and implementation of AI systems.

Emergence refers to the natural development of capabilities during training, rather than through explicit programming. Imagine teaching a child to ride a bike; you don't dictate every movement, but through practice, the skill emerges organically.

Homogenization indicates the adoption of similar methodologies across various challenges. Instead of creating entirely distinct solutions for each problem, we now implement standardized techniques that are applicable across multiple scenarios.

The Three Stages of AI Evolution

No description has been provided for this image

Read more…

Sijan Bhandari on

What are Large Language Models (LLMs) ?

The Evolution of Large Language Models: From Turing's Vision to the Reality of ChatGPT

A 70-Year Journey: From the Turing Test to Contemporary AI

The journey to develop machines that genuinely grasp human language began in the 1950s with Alan Turing’s introduction of his renowned test for machine intelligence. This monumental challenge posed the question: how can we teach computers to understand the intricacies and nuances of human language?

Language transcends mere words; it is a complex system enriched with grammar rules, cultural contexts, implied meanings, and creative expression. Imagine attempting to convey sarcasm, poetry, or humor to someone unfamiliar with human emotions. This was the challenge engineers confronted while designing machines capable of understanding language.

Three Stages of Evolution in Language AI

Stage 1: Statistical Language Models (1990s-2010s)
Early language models functioned like advanced autocomplete systems, relying on statistical patterns to predict subsequent words. For instance, if you entered "The weather is," the system would analyze millions of examples to suggest words like "nice," "cold," or "sunny" based on observed frequency patterns.
Limitations: While these models could complete sentences, they lacked true comprehension of meaning or context beyond a few words.

Stage 2: Neural Language Models (2010s)
The advent of neural networks transformed language processing, allowing models to grasp context and word relationships. For example, unlike statistical models, neural networks could discern that "bank" has different meanings in "river bank" and "savings bank" by evaluating the surrounding context.
Breakthrough: Models like BERT (2018) significantly improved language comprehension by enabling them to read entire sentences and understand the interconnections between words.

Stage 3: Large Language Models - The Current Revolution (2020s-Present)
A remarkable breakthrough emerged when researchers discovered that enlarging language models significantly enhanced their performance and granted them new capabilities.

The Importance of Scale: Discovering the Impact of Size
Researchers identified that when language models exceeded specific size thresholds—transitioning from millions to hundreds of billions of parameters—extraordinary advancements occurred. These models not only excelled in existing tasks but also developed entirely new abilities.
Consider it this way: imagine learning to play the piano, and upon reaching a certain level, you suddenly find yourself able to compose symphonies without formal training in composition.

What Defines a "Large" Language Model?
Modern Large Language Models are characterized by:

  • Hundreds of billions of parameters, in contrast to older models with millions.
  • Training on extensive text datasets sourced from the internet.
  • Transformer architecture that enables the processing and understanding of relationships between words over lengthy passages.

For instance, GPT-3 boasts 175 billion parameters—imagine a brain with 175 billion adjustable connections, each fine-tuned through exposure to a vast array of human-written knowledge.

Emergent Abilities: Unforeseen Capabilities
One of the most astonishing features is "in-context learning," which allows models to acquire new tasks simply by observing examples within a conversation.
Example:

If you present the model with: "Dog -> Animal, Rose -> Flower, Oak -> ?”
It can respond with: "Tree"
This demonstrates its ability to recognize patterns (specific items to their categories) from the examples provided.

Additional Emergent Abilities:

  • Complex reasoning: Solving intricate multi-step math problems.
  • Creative writing: Producing poetry, stories, and scripts.
  • Code generation: Writing functional computer programs.
  • Language translation: Converting text between languages even if not specifically trained for those translations.

Summary:

Language models create and produce text by predicting the likelihood of a word or series of words appearing within a larger context. This capability is particularly beneficial for tasks such as text generation and translation.

Large language models (LLMs) are sophisticated models that utilize extensive parameters and large datasets, allowing them to handle longer text sequences and execute complex functions like summarization and answering questions.

Transformers serve as a fundamental architecture in LLMs, employing attention mechanisms to prioritize significant parts of the input, which improves processing efficiency.

LLMs offer a wide range of applications, including text generation, translation, sentiment analysis, and code generation. However, they also raise important considerations regarding costs, biases, and ethical implications.

Citation: Zhao, Wayne Xin, et al. "A Survey of Large Language Models." arXiv preprint arXiv:2303.18223 (2023). arXiv: https://arxiv.org/abs/2303.18223

Understanding High Bias in Machine Learning with Real-World Example

High bias in machine learning results in underfitting, characterized by the model making oversimplified assumptions about the relationships within the data. This leads to subpar performance on both training and test datasets, demonstrating that the model does not possess the necessary complexity to capture the underlying patterns.

In this example, we will check house price prediction using two methods:

  1. Linear Regression: The simple linear model does not adequately capture the complex relationships between features and house prices.
  2. Polynomial Regression: The PolynomialFeatures step enhances the dataset by generating polynomial and interaction terms from the original features. For instance, when you have a feature x with a degree of 2, it will produce new features such as x, x², and cross-terms like x1 * x2. This approach enables a linear regression model to effectively capture non-linear relationships.
In [31]:
import pandas as pd

import matplotlib.pyplot as plt

from sklearn.metrics import make_scorer, mean_squared_error
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import sklearn.datasets

from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline
In [21]:
house_price_dataset = sklearn.datasets.fetch_california_housing()
In [22]:
print(house_price_dataset.DESCR)
.. _california_housing_dataset:

California Housing dataset
--------------------------

**Data Set Characteristics:**

    :Number of Instances: 20640

    :Number of Attributes: 8 numeric, predictive attributes and the target

    :Attribute Information:
        - MedInc        median income in block group
        - HouseAge      median house age in block group
        - AveRooms      average number of rooms per household
        - AveBedrms     average number of bedrooms per household
        - Population    block group population
        - AveOccup      average number of household members
        - Latitude      block group latitude
        - Longitude     block group longitude

    :Missing Attribute Values: None

This dataset was obtained from the StatLib repository.
https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html

The target variable is the median house value for California districts,
expressed in hundreds of thousands of dollars ($100,000).

This dataset was derived from the 1990 U.S. census, using one row per census
block group. A block group is the smallest geographical unit for which the U.S.
Census Bureau publishes sample data (a block group typically has a population
of 600 to 3,000 people).

A household is a group of people residing within a home. Since the average
number of rooms and bedrooms in this dataset are provided per household, these
columns may take surprisingly large values for block groups with few households
and many empty houses, such as vacation resorts.

It can be downloaded/loaded using the
:func:`sklearn.datasets.fetch_california_housing` function.

.. topic:: References

    - Pace, R. Kelley and Ronald Barry, Sparse Spatial Autoregressions,
      Statistics and Probability Letters, 33 (1997) 291-297

In [23]:
# Loading the dataset to a pandas dataframe
df_house_data = pd.DataFrame(house_price_dataset.data, columns = house_price_dataset.feature_names)
In [24]:
df_house_data.head()
Out[24]:
MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude Longitude
0 8.3252 41.0 6.984127 1.023810 322.0 2.555556 37.88 -122.23
1 8.3014 21.0 6.238137 0.971880 2401.0 2.109842 37.86 -122.22
2 7.2574 52.0 8.288136 1.073446 496.0 2.802260 37.85 -122.24
3 5.6431 52.0 5.817352 1.073059 558.0 2.547945 37.85 -122.25
4 3.8462 52.0 6.281853 1.081081 565.0 2.181467 37.85 -122.25
In [25]:
# add the target column
# target is median house value in block group (in $100,000s).
df_house_data['price'] = house_price_dataset.target
In [26]:
df_house_data.head()
Out[26]:
MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude Longitude price
0 8.3252 41.0 6.984127 1.023810 322.0 2.555556 37.88 -122.23 4.526
1 8.3014 21.0 6.238137 0.971880 2401.0 2.109842 37.86 -122.22 3.585
2 7.2574 52.0 8.288136 1.073446 496.0 2.802260 37.85 -122.24 3.521
3 5.6431 52.0 5.817352 1.073059 558.0 2.547945 37.85 -122.25 3.413
4 3.8462 52.0 6.281853 1.081081 565.0 2.181467 37.85 -122.25 3.422
House Price Prediction Using Linear Regression
In [34]:
# Prepare data
X = df_house_data.drop('price', axis=1)
y = df_house_data['price']

# Split and scale
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train linear model
model = LinearRegression()
model.fit(X_train_scaled, y_train)

# Evaluate
train_pred = model.predict(X_train_scaled)
test_pred = model.predict(X_test_scaled)

print("Training MSE:", mean_squared_error(y_train, train_pred))
print("Test MSE:", mean_squared_error(y_test, test_pred))

train_score = model.score(X_train_scaled, y_train)
test_score = model.score(X_test_scaled, y_test)

print(f"Training R² Score: {train_score:.4f}")
print(f"Test R² Score: {test_score:.4f}")
Training MSE: 0.5240457125963887
Test MSE: 0.5261093658365182
Training R² Score: 0.6081
Test R² Score: 0.5980
House Price Prediction Using Polynomial Regression
  • Observations:
    • The mean squared error is decreased in test set.

    • The R² score is increased in test set.

      -- R² Interpretation: An R-squared value of 0.75 indicates that 75% of the variation in house prices can be attributed to factors such as square footage, location, and the amenities included in the model.

Why Use both PolynomialFeatures and LinearRegression in the Pipeline:

- The first PolynomialFeatures transformation creates a more complex feature space.
- The LinearRegression then fits a linear model to these non-linear features.
- This effectively allows a linear model to approximate non-linear relationships.
In [35]:
def polynomial_regression_model(X_train, X_test, degree=2):
    # Create pipeline
    model = Pipeline([
        ('scaler', StandardScaler()),
        ('poly', PolynomialFeatures(degree=degree)),
        ('linear', LinearRegression())
    ])
   
    # Fit and evaluate
    model.fit(X_train, y_train)

    # Evaluate
    train_pred = model.predict(X_train_scaled)
    test_pred = model.predict(X_test_scaled)

    print("Training MSE:", mean_squared_error(y_train, train_pred))
    print("Test MSE:", mean_squared_error(y_test, test_pred))

    train_score = model.score(X_train, y_train)
    test_score = model.score(X_test, y_test)
    print(f"Training R² Score: {train_score:.4f}")
    print(f"Test R² Score: {test_score:.4f}")
    
    return model, X_train_poly, X_test_poly

# Example usage with housing data
model_poly, X_train_poly, X_test_poly = polynomial_regression_model(
    X_train_scaled, X_test_scaled
)
Training MSE: 0.4219834148836872
Test MSE: 0.42363567392919027
Training R² Score: 0.6844
Test R² Score: 0.6763
In [ ]:
 

How to combat overfitting and underfitting in Machine Learning ?

Machine Learning models learn the relationship between input (features) and output (target) using learnable parameters. The size of these parameters defines the complexity and flexibility of a given model.

There are two typical scenarios. When the flexibility of a model is insufficient to capture the underlying pattern in a training dataset, the model is called underfitted. Conversely, when the model is too flexible to the underlying pattern, it is said that the model has “memorized” the training data, resulting in an overfitted model.

Consider a system that can be explained by a quadratic function, but we use a simple line to represent it, i.e., a single parameter to capture the underlying trends in the data. Because the function lacks the required complexity to fit the data (two parameters), we end up with a poor predictor. In this case, the model will have high bias, meaning we will get consistent but consistently wrong answers. This is called an underfitted model.

Now imagine that the true system is a parabola, but we use a higher-order polynomial to fit it. Due to natural noise in the data used to fit (deviations from the perfect parabola), the overly complex model treats these fluctuations and noise as intrinsic properties of the system and attempts to fit them. The result is a model with high variance.

More details:

What is the trade-off between bias and variance in machine learning ?

Short: A model with minimal parameters may exhibit high bias and low variance, while a model with numerous parameters may demonstrate high variance and low bias. Therefore, it is essential to achieve an optimal balance to avoid overfitting and underfitting the data. High bias arises from incorrect assumptions made by the learning algorithm, whereas variance arises from a model's sensitivity to minor variations in the training dataset.

Detail: During development, all algorithms exhibit some degree of bias and variance. Models can be adjusted to address either bias or variance, but it is impossible to reduce both to zero without adversely affecting the other. This introduces the concept of the bias-variance trade-off. Bias refers to the discrepancy between the average prediction of our model and the actual value being predicted, indicating the presence of systematic errors in the model. Every algorithm inherently possesses some level of bias due to assumptions made within the model to simplify learning the target function. High bias can lead to underfitting, where the algorithm fails to capture relevant relationships between features and target outputs. Simpler algorithms tend to introduce more bias, whereas nonlinear algorithms usually have lower bias. These errors can originate from various sources, including the selection of training data, feature choices, or the training algorithm itself. Variance measures how much a model's predictions change with different training sets, indicating the degree of over-specialization to a particular training set (overfitting). The goal is to assess the deviation of our model from the best possible model for the training data. The ideal model seeks to minimize both bias and variance, achieving a balance that is neither too simple nor too complex, thereby yielding minimal error. Low-variance models typically have a simple structure and are less sophisticated, but they risk being highly biased. Examples include Regression and Naive Bayes. Conversely, low-bias models generally have a more flexible and complex structure but are prone to high variance. Examples include Nearest Neighbors and Decision Trees. Overfitting arises when a model is overly complex and learns the noise in the data rather than the actual signals.