Technology Definitions

Artificial Intelligence & Machine Learning Terms

Creating intelligent systems that learn.

Artificial Intelligence

Artificial Intelligence (AI) is a wide-ranging branch of computer science concerned with building smart machines capable of performing tasks that typically require human intelligence.

Machine Learning

Machine Learning (ML) is a subset of AI that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. It focuses on the development of algorithms that can access data and use it to learn for themselves.

Deep Learning is a subfield of machine learning based on artificial neural networks with multiple layers (deep architectures). These networks are capable of learning complex patterns and representations from large amounts of data.

An Artificial Neural Network (ANN) is a computational model inspired by the structure and function of biological neural networks in the brain. It consists of interconnected nodes, or "neurons," organized in layers, that process information.

Supervised Learning

Supervised learning is a type of machine learning where the model is trained on a labeled dataset. This means that each piece of training data has a corresponding "correct" output or label.

Unsupervised Learning

Unsupervised learning is a type of machine learning where the model is trained on an unlabeled dataset. The model tries to find patterns, structures, and relationships within the data on its own.

Reinforcement Learning

Reinforcement Learning (RL) is a type of machine learning where an "agent" learns to make decisions by performing actions in an environment to achieve some goal. The agent learns through trial and error, receiving "rewards" for good actions and "penalties" for bad ones.

In machine learning, a model is the artifact that is created by the training process. It is a mathematical function that takes an input and produces a prediction or decision as output. It represents the "learned" patterns from the training data.

Training is the process of feeding a machine learning model a large dataset, allowing the algorithm to adjust its internal parameters until it can accurately map the input data to the desired output. This is the "learning" part of machine learning.

Inference is the process of using a trained machine learning model to make a prediction on new, unseen data. It is the "application" phase, where the model puts its learning into practice.

A dataset is a collection of data. In machine learning, it is the collection of examples used to train and evaluate a model. It is typically split into training, validation, and testing sets.

Natural Language Processing (NLP)

Natural Language Processing (NLP) is a field of AI that gives computers the ability to read, understand, and derive meaning from human language. It combines computational linguistics with machine learning and deep learning models.

Large Language Model (LLM)

A Large Language Model (LLM) is a type of deep learning model that is trained on a massive amount of text data. It is capable of understanding and generating human-like text for a wide range of tasks.

Generative AI is a class of artificial intelligence models that can generate new, original content, such as text, images, music, or code, based on the data they were trained on.

A Transformer is a deep learning model architecture introduced in the 2017 paper "Attention Is All You Need." It is based on the concept of self-attention, which allows it to weigh the importance of different words in the input text.

Attention Mechanism

An attention mechanism is a technique in neural networks that allows the model to focus on the most relevant parts of the input sequence when producing an output. It mimics cognitive attention in humans.

In an artificial neural network, a neuron (or node) is the basic computational unit. It receives one or more inputs, applies a mathematical function to them, and produces an output.

Activation Function

An activation function is a function used in an artificial neural network that defines the output of a neuron given a set of inputs. It introduces non-linearity into the network, which is crucial for learning complex patterns.

Backpropagation

Backpropagation is the primary algorithm for training artificial neural networks. It works by calculating the "error" or "loss" of the model's prediction compared to the correct output, and then propagating this error backward through the network to calculate how much each weight and bias contributed to the error.

Gradient Descent

Gradient descent is an optimization algorithm used to find the local minimum of a function. In machine learning, it is used to minimize the model's "loss" or "error" by iteratively adjusting the model's parameters (weights and biases).

Overfitting is a modeling error in machine learning that occurs when a model learns the training data too well. It learns not only the underlying patterns but also the noise and random fluctuations in the training data, to the extent that it negatively impacts the model's performance on new, unseen data.

Underfitting is a modeling error in machine learning that occurs when a model is too simple to capture the underlying patterns in the data. It performs poorly on both the training data and new, unseen data.

Bias-Variance Tradeoff

In machine learning, the bias-variance tradeoff is the conflict in trying to simultaneously minimize two sources of error: bias (the error from erroneous assumptions in the learning algorithm) and variance (the error from sensitivity to small fluctuations in the training set).

Regression is a type of supervised machine learning task where the goal is to predict a continuous output value. The model learns a function that maps input variables to a continuous output variable.

Classification is a type of supervised machine learning task where the goal is to predict a categorical output label. The model learns to assign a class label to an input example.

Clustering is a type of unsupervised machine learning task where the goal is to group a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups.

Linear Regression

Linear regression is a basic and commonly used type of predictive analysis. It is a statistical approach for modeling the relationship between a dependent variable with a given set of independent variables by fitting a linear equation to the observed data.

Logistic Regression

Despite its name, logistic regression is a supervised learning algorithm used for classification problems. It models the probability of a discrete outcome (e.g., true/false, yes/no) given an input variable.

A decision tree is a supervised learning algorithm that is used for both classification and regression tasks. It is a flowchart-like structure in which each internal node represents a "test" on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label or a continuous value.

A Random Forest is an ensemble learning method used for classification and regression that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.

Gradient Boosting

Gradient Boosting is an ensemble learning method that builds a strong predictive model by sequentially adding "weak" models (typically decision trees). Each new tree is trained to correct the errors made by the previous trees.

Support Vector Machine (SVM)

A Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression analysis. It works by finding the "hyperplane" that best separates the data points of different classes in a high-dimensional space.

k-Nearest Neighbors (k-NN)

k-Nearest Neighbors (k-NN) is a simple, supervised machine learning algorithm that can be used for both classification and regression. It makes predictions by finding the "k" most similar examples in the training data and using their outcomes to predict the outcome for the new data point.

k-Means is a popular unsupervised learning algorithm used for clustering. It aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centroid).

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is an unsupervised learning technique used for dimensionality reduction. It works by transforming a large set of variables into a smaller one that still contains most of the information in the large set.

Convolutional Neural Network (CNN)

A Convolutional Neural Network (CNN) is a type of deep learning model that is specifically designed for processing data with a grid-like topology, such as an image. CNNs use a special operation called a "convolution" to automatically and adaptively learn a hierarchy of features from the input.

Recurrent Neural Network (RNN)

A Recurrent Neural Network (RNN) is a type of artificial neural network which uses sequential data or time series data. They have an internal "memory" that allows them to persist information from previous inputs in a sequence to influence the current output.

Long Short-Term Memory (LSTM)

Long Short-Term Memory (LSTM) is a special kind of Recurrent Neural Network (RNN) that is capable of learning long-term dependencies. It uses a series of "gates" to control what information is added to or removed from its internal memory state.

In machine learning, a feature is an individual measurable property or characteristic of a phenomenon being observed. Features are the inputs to your machine learning model.

Feature Engineering

Feature engineering is the process of using domain knowledge to extract features (characteristics, properties, attributes) from raw data. It involves creating new features or selecting the most relevant ones to improve the performance of a machine learning model.

A hyperparameter is a parameter whose value is used to control the learning process of a machine learning model. Its value is set before the learning process begins.

The learning rate is a hyperparameter in an optimization algorithm like gradient descent that determines the step size at each iteration while moving toward a minimum of a loss function.

A loss function (or cost function) is a function that measures the "cost" or "error" of a model's predictions compared to the actual correct values. The goal of training a model is to find a set of parameters that minimizes this loss function.

Regularization is a set of techniques used to prevent overfitting in machine learning models. It works by adding a penalty term to the loss function that discourages the model from becoming too complex.

Tokenization is the process of breaking down a piece of text into smaller units called "tokens". These tokens can be words, characters, or sub-words.

In machine learning, an embedding is a learned representation for discrete data, like words or user IDs, where items are mapped to a dense vector of real numbers. Items with similar meanings or properties are positioned close to each other in this high-dimensional vector space.

In classification tasks, accuracy is a metric that measures the number of correct predictions made by a model as a percentage of the total number of predictions.

In classification tasks, precision is a metric that measures the proportion of positive predictions that were actually correct. It answers the question: "Of all the times the model predicted positive, how often was it right?"

In classification tasks, recall (also known as sensitivity or true positive rate) is a metric that measures the proportion of actual positives that were correctly identified by the model. It answers the question: "Of all the actual positive cases, how many did the model find?"

The F1-score is a metric used in classification that combines precision and recall into a single score. It is the harmonic mean of precision and recall.

Computer Vision

Computer vision is a field of artificial intelligence that trains computers to interpret and understand the visual world. Using digital images from cameras and videos and deep learning models, machines can accurately identify and classify objects — and then react to what they "see."

Transfer Learning

Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task. It is a popular approach in deep learning where pre-trained models are used as the starting point on computer vision and NLP tasks.

Fine-tuning is a process in transfer learning where the weights of a pre-trained model are further trained on a new, specific dataset. This adapts the general knowledge of the pre-trained model to the nuances of the new task.

Generative Adversarial Network (GAN)

A Generative Adversarial Network (GAN) is a class of machine learning frameworks where two neural networks, a "Generator" and a "Discriminator," contest with each other in a zero-sum game. The Generator tries to create realistic data (like images), and the Discriminator tries to tell the difference between the real data and the fake data created by the Generator.

Ensemble Learning

Ensemble learning is a machine learning paradigm where multiple models, known as "weak learners," are trained to solve the same problem and combined to get better results. The main idea is that a diverse group of models is often better than any single model alone.

Data Augmentation

Data augmentation is a technique used to increase the size and diversity of a training dataset by creating modified copies of existing data or newly created synthetic data. It helps to reduce overfitting when training a machine learning model.

In machine learning, the batch size is a hyperparameter that defines the number of samples to work through before updating the model's internal parameters. The dataset is broken down into one or more batches.

In machine learning, an epoch is one complete pass through the entire training dataset. Models are typically trained for multiple epochs.

Dropout is a regularization technique for neural networks that prevents overfitting. It works by randomly "dropping out" (ignoring) a fraction of neurons during each training step. This forces the network to learn more robust features that are not dependent on any single neuron.

One-Hot Encoding

One-hot encoding is a process by which categorical variables are converted into a binary vector representation. Each category is represented by a vector where one element is "hot" (1) and all others are "cold" (0).

Confusion Matrix

A confusion matrix is a table used to describe the performance of a classification model on a set of test data for which the true values are known. It shows the number of true positives, true negatives, false positives, and false negatives.

A Receiver Operating Characteristic (ROC) curve is a graph showing the performance of a classification model at all classification thresholds. It plots the True Positive Rate (Recall) against the False Positive Rate.

AUC stands for Area Under the ROC Curve. It is a performance measurement for classification problems. AUC represents the degree or measure of separability between classes. It tells how much the model is capable of distinguishing between classes.

XGBoost (eXtreme Gradient Boosting) is an open-source software library which provides a regularizing gradient boosting framework for C++, Java, Python, R, Julia, Perl, and Scala. It is known for its speed and performance.

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data. It combines statistics, computer science, and domain expertise.

TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It was developed by Google and provides a comprehensive ecosystem of tools, libraries, and community resources that lets researchers and developers build and deploy ML-powered applications.

PyTorch is an open-source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing. It was primarily developed by Facebook's AI Research lab (FAIR).

Keras is an open-source software library that provides a Python interface for artificial neural networks. Keras acts as an interface for the TensorFlow library. It is designed to enable fast experimentation with deep neural networks, focusing on being user-friendly, modular, and extensible.

Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN.

In computer programming, pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.

NumPy (Numerical Python) is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

Jupyter Notebook

Project Jupyter is a project and community whose goal is to "develop open-source software, open-standards, and services for interactive computing across dozens of programming languages". A Jupyter Notebook is a web-based interactive computational environment for creating documents that contain live code, equations, visualizations and narrative text.

Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics.

Data Visualization

Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.

In supervised machine learning, a label is the "answer" or the correct output for a given piece of data. It is the value you are trying to predict.

Pre-trained Model

A pre-trained model is a model that was trained on a large benchmark dataset to solve a problem similar to the one that we want to solve. This model, with its saved weights and parameters, can then be used as a starting point for a new task.

Prompt Engineering

Prompt engineering is the process of structuring text that can be interpreted and understood by a generative AI model. A prompt is natural language text describing the task that an AI should perform.

Artificial General Intelligence (AGI)

Artificial General Intelligence (AGI) is the hypothetical intelligence of a machine that has the capacity to understand or learn any intellectual task that a human being can. It is a primary goal of some artificial intelligence research and a common topic in science fiction and futures studies.

Narrow AI, also known as Weak AI, is a type of artificial intelligence that is focused on performing a single, specific task. It operates within a pre-defined range and cannot perform tasks beyond its designated field.

In the context of machine learning, bias refers to systematic errors in the model that result from incorrect assumptions in the learning algorithm. More broadly, it can also refer to the way a model reflects the societal biases present in its training data.

Neural Architecture Search (NAS)

Neural Architecture Search (NAS) is a technique for automating the design of artificial neural networks. NAS uses a machine learning algorithm to search for the best neural network architecture for a given task.

Automated Machine Learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems. This includes tasks like feature engineering, model selection, hyperparameter tuning, and model deployment.

Semi-Supervised Learning

Semi-supervised learning is a machine learning approach that combines a small amount of labeled data with a large amount of unlabeled data during training. It falls between unsupervised learning (with no labeled data) and supervised learning (with only labeled data).

Self-Supervised Learning

Self-supervised learning is a type of machine learning where the model learns from the data itself without explicit human-provided labels. It does this by creating its own labels from the input data, by solving a "pretext task."

BERT (Bidirectional Encoder Representations from Transformers) is a language representation model introduced by Google in 2018. It is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.

GPT (Generative Pre-trained Transformer) is a family of language models developed by OpenAI. They are based on the Transformer architecture and are pre-trained on a massive corpus of text data to generate human-like text.