Deep Learning Fundamentals

A comprehensive guide to deep learning and neural networks - from basic concepts to modern architectures.

Overview

This repository provides a practical introduction to deep learning, covering:

Core Concepts: Gradient descent, backpropagation, activation functions
Neural Architectures: CNNs, RNNs, LSTMs, Transformers
Practical Implementation: Keras/TensorFlow examples
Real Applications: Image classification, sequential modeling, NLP

Gradient Descent and the Training Loop

Gradient Descent is an iterative optimization algorithm used to minimize the model’s error. A key hyperparameter in this process is the learning rate:

If the learning rate is too high, the model may overshoot the minimum and fail to converge.
If it is too low, training becomes slow and may get stuck in suboptimal points.

Training a neural network involves repeatedly cycling through three main steps:

Forward propagation – where the input is passed through the network to generate a prediction.
Error calculation – comparing the predicted output with the true value.
Backpropagation – updating weights and biases to reduce the error.

This loop continues over multiple iterations (called epochs) until a stopping criterion is met, such as a low enough error or a maximum number of epochs.

📘 Notebook: Backpropagation

⚠️ Vanishing Gradient Problem

A significant challenge in training deep networks is the vanishing gradient problem, especially when using activation functions like sigmoid. In deep architectures:

Gradients in early layers can become extremely small during backpropagation.
This results in slow or stalled learning in those layers.
Ultimately, this affects the model's ability to learn complex representations and degrades accuracy.

To mitigate this, activation functions that avoid shrinking gradients — such as ReLU — are preferred in modern architectures, especially in hidden layers.

Activation Functions in Neural Networks

Activation functions introduce non-linearity into the network, enabling it to model complex relationships. Common types include:

Sigmoid: Historically popular but now less used due to vanishing gradients.
Tanh: A centered and scaled version of sigmoid, also prone to gradient issues.
ReLU (Rectified Linear Unit): The most widely used function today; it enables faster, more efficient training by avoiding activation of all neurons at once.
Softmax: Used in the output layer for multi-class classification tasks, converting outputs into probability distributions.

📘 Notebook: Activation Functions

Deep Learning Libraries: TensorFlow, PyTorch & Keras

There are several major libraries used for deep learning development:

TensorFlow: Ideal for deploying models in production; backed by a strong community.
PyTorch: Preferred in academic research; known for flexibility and strong GPU support.
Both are powerful but can be challenging for beginners due to their complexity.
Keras: A high-level API that simplifies the process of building and training deep learning models. It is especially beginner-friendly, offering:
- Clean and readable syntax,
- Rapid prototyping,
- Integration with TensorFlow as its backend.

Keras enables the creation of powerful models with just a few lines of code, making it an excellent choice for those new to deep learning.

Data Preparation for Keras Models

Before training a model with Keras, your dataset must be properly structured:

Split your data into predictors (input features) and target (labels).
For classification tasks, the target values must be transformed into binary arrays using the to_categorical() function from Keras utilities.

Once prepared, the data can be fed into Keras models to build and train deep learning architectures efficiently.

📘 Notebook: Intro to Keras

Deep Learning Models

Shallow vs. Deep Neural Networks

Shallow Neural Networks: One hidden layer; only accept vector inputs.
Deep Neural Networks: Multiple hidden layers; can process raw data like images and text.
Rise of Deep Learning driven by: algorithmic advancements, large data availability, and powerful hardware (e.g., GPUs).

Convolutional Neural Networks (CNNs)

Designed specifically for image-related tasks (e.g., recognition, object detection).
Input shapes:
- Grayscale: (n x m x 1)
- RGB/Color: (n x m x 3)
Layers:
- Convolutional Layer: Applies filters (kernels) to extract features.
- ReLU Activation: Keeps positive values, zeroes out negatives.
- Pooling Layer: Reduces spatial dimensions (Max Pooling, Avg Pooling).
- Fully Connected Layer: Flattens feature maps and connects to output.

📘 Notebook: CNNs with Keras

Recurrent Neural Networks (RNNs)

Handle sequential data by incorporating past outputs into current inputs.
Suitable for tasks like:
- Text & speech processing
- Time series prediction
- Handwriting and genome analysis
LSTM (Long Short-Term Memory): A type of RNN that captures long-term dependencies. Used for:
- Image captioning
- Handwriting/image generation
- Video description

Transformers

Handle sequential data using self-attention, not recurrence.
Capture relationships between all tokens in a sequence simultaneously.
Enable highly parallelizable and scalable training.
Foundation of modern NLP models like BERT, GPT, and T5.
Used in tasks such as language translation, summarization, and question answering.

📘 Notebook: Transformers with Keras

Autoencoders & RBMs

Autoencoders: Unsupervised models for data compression and reconstruction.
- Encoder compresses input; decoder reconstructs it.
- Applications: noise removal, dimensionality reduction, data visualization.
Restricted Boltzmann Machines (RBMs):
- Used for feature extraction, handling imbalanced data, estimating missing values.
Aircraft Damage Classification - Real-world application

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
notebooks		notebooks
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning Fundamentals

Overview

Gradient Descent and the Training Loop

⚠️ Vanishing Gradient Problem

Activation Functions in Neural Networks

Deep Learning Libraries: TensorFlow, PyTorch & Keras

Data Preparation for Keras Models

Deep Learning Models

Shallow vs. Deep Neural Networks

Convolutional Neural Networks (CNNs)

Recurrent Neural Networks (RNNs)

Transformers

Autoencoders & RBMs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deep Learning Fundamentals

Overview

Gradient Descent and the Training Loop

⚠️ Vanishing Gradient Problem

Activation Functions in Neural Networks

Deep Learning Libraries: TensorFlow, PyTorch & Keras

Data Preparation for Keras Models

Deep Learning Models

Shallow vs. Deep Neural Networks

Convolutional Neural Networks (CNNs)

Recurrent Neural Networks (RNNs)

Transformers

Autoencoders & RBMs

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages