diff --git a/docs/03_advanced_python/shell_scripting.md b/docs/03_advanced_python/shell_scripting.md new file mode 100644 index 000000000..1324fe99c --- /dev/null +++ b/docs/03_advanced_python/shell_scripting.md @@ -0,0 +1,18 @@ +# Shell Scripting +Shell scripting involves writing scripts for the command-line interpreter, known as the shell, on Unix-based systems and Windows. These scripts automate tasks such as file manipulation, program execution, and text processing. Common Unix shells include `Bash`, `Zsh`, and `Ksh`, while Windows uses Command Prompt (`cmd`) and PowerShell. + +Shell scripts start with a shebang (`#!`) and extensions of `sh` on Unix and `.bat` or `.ps1` on Windows. Commands like `echo`, `grep`, and `awk` are common on Unix, while `Write-Output`, `Get-Content`, and `ForEach-Object` are used in PowerShell. This is very often used together with Python scripts and Jupyter notebook magics to automate batch analysis. + +```python +# Running Python pip installation from Jupyter notebook with shell command +%pip install matplotlib +``` + +For Windows users it is highly recommended to run Python alongside with Git bash (https://git-scm.com/downloads) that maximally mimic the running *nix running environment. + +## Summary Table on OS and Shell Scipts +| Operating System | Terminal Emulator | Default Shell | Additional Shells | Pros | Cons | +|------------------|------------------------------|------------------|-------------------------------------|------------------------------------------------------------------------------|--------------------------------------------------------------------| +| Linux | GNOME Terminal, Konsole | Bash | Zsh, Fish, Ksh, Tcsh, Dash | Highly customizable, vast array of tools, strong community support, open-source | Fragmentation in terminal emulators, varying default configurations| +| macOS | Terminal, iTerm2 | Zsh (since 10.15) | Bash, Fish, Ksh, Tcsh | User-friendly, well-integrated with macOS, iTerm2 offers advanced features | Terminal app is less feature-rich compared to iTerm2 | +| Windows | Command Prompt, PowerShell, Windows Terminal | PowerShell | Bash (via WSL), Git Bash, Cygwin | Powerful scripting capabilities in PowerShell, WSL brings Linux compatibility | Command Prompt is limited, PowerShell syntax can be complex | diff --git a/docs/20b_deep_learning/tensor_core.ipynb b/docs/20b_deep_learning/tensor_core.ipynb new file mode 100644 index 000000000..d9d74fd39 --- /dev/null +++ b/docs/20b_deep_learning/tensor_core.ipynb @@ -0,0 +1,800 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Tensor Core\n", + "\n", + "A **Tensor Core** is a computing unit in Nvidia GPUs that multiplies two matrices, and then adds a third matrix to the result to accomplish hardware accelerated **General Matrix Multiplication** (GEMM). To leverage Tensor Cores in TensorFlow and PyTorch, you need to ensure that you're using the right hardware, software versions, and configurations. Tensor Cores are specialised processing units available in after NVIDIA's Volta architectures. For further details check [hardware/Neural Processing Unit](../80_hardware/gpu.md)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To use full precision (FP32) in TensorFlow and PyTorch, you typically don't need to do anything special, as FP32 is the default precision used by most deep learning frameworks. \n", + "\n", + "In AI model training, memory is often the bottleneck and hence may use lower precisions Tensor Cores. However this is beyond the notebook's scope and we will only demonstrate FP32 AI training here.\n", + "\n", + "## Prerequisites\n", + "1. **Hardware**: Ensure you have an NVIDIA GPU that supports Tensor Cores (Volta, Turing, or Ampere architectures).\n", + "2. **CUDA Toolkit**: Install the CUDA toolkit version supported by your GPU.\n", + "3. **cuDNN Library**: Install the corresponding cuDNN library version." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " __ __ __ __\n", + " / \\ / \\ / \\ / \\\n", + " / \\/ \\/ \\/ \\\n", + "███████████████/ /██/ /██/ /██/ /████████████████████████\n", + " / / \\ / \\ / \\ / \\ \\____\n", + " / / \\_/ \\_/ \\_/ \\ o \\__,\n", + " / _/ \\_____/ `\n", + " |/\n", + " ███╗ ███╗ █████╗ ███╗ ███╗██████╗ █████╗\n", + " ████╗ ████║██╔══██╗████╗ ████║██╔══██╗██╔══██╗\n", + " ██╔████╔██║███████║██╔████╔██║██████╔╝███████║\n", + " ██║╚██╔╝██║██╔══██║██║╚██╔╝██║██╔══██╗██╔══██║\n", + " ██║ ╚═╝ ██║██║ ██║██║ ╚═╝ ██║██████╔╝██║ ██║\n", + " ╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝╚═════╝ ╚═╝ ╚═╝\n", + "\n", + " mamba (1.1.0) supported by @QuantStack\n", + "\n", + " GitHub: https://github.com/mamba-org/mamba\n", + " Twitter: https://twitter.com/QuantStack\n", + "\n", + "█████████████████████████████████████████████████████████████\n", + "\n", + "/home/jackyko/mambaforge/lib/python3.10/site-packages/conda_package_streaming/package_streaming.py:19: UserWarning: zstandard could not be imported. Running without .conda support.\n", + " warnings.warn(\"zstandard could not be imported. Running without .conda support.\")\n", + "/home/jackyko/mambaforge/lib/python3.10/site-packages/conda_package_handling/api.py:29: UserWarning: Install zstandard Python bindings for .conda support\n", + " _warnings.warn(\"Install zstandard Python bindings for .conda support\")\n", + "\n", + "Looking for: ['nvidia/label/cuda-11.8.0::cuda-toolkit']\n", + "\n", + "conda-forge/linux-64 Using cache\n", + "conda-forge/noarch Using cache\n", + "\u001b[?25l\u001b[2K\u001b[0G[+] 0.0s\n", + "nvidia/label/cuda-11.8.0/linux-64 \u001b[90m━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━\u001b[0m 0.0 B @ ??.?MB/s 0.0s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 0.1s\n", + "nvidia/label/cuda-11.8.0/linux-64 \u001b[90m━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━\u001b[0m 0.0 B @ ??.?MB/s 0.1s\n", + "nvidia/label/cuda-11.8.0/noarch \u001b[33m━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 0.0 B @ ??.?MB/s 0.1s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 0.2s\n", + "nvidia/label/cuda-11.8.0/linux-64 \u001b[90m━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━\u001b[0m 0.0 B @ ??.?MB/s 0.2s\n", + "nvidia/label/cuda-11.8.0/noarch \u001b[33m━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━\u001b[0m 0.0 B @ ??.?MB/s 0.2s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 0.3s\n", + "nvidia/label/cuda-11.8.0/linux-64 \u001b[90m━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━\u001b[0m 0.0 B @ ??.?MB/s 0.3s\n", + "nvidia/label/cuda-11.8.0/noarch \u001b[33m━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━\u001b[0m 0.0 B @ ??.?MB/s 0.3s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 0.4s\n", + "nvidia/label/cuda-11.8.0/linux-64 \u001b[33m━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━\u001b[0m 0.0 B @ ??.?MB/s 0.4s\n", + "nvidia/label/cuda-11.8.0/noarch \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━\u001b[0m 0.0 B @ ??.?MB/s 0.4s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0Gnvidia/label/cuda-11.8.0/linux-64 No change\n", + "nvidia/label/cuda-11.8.0/noarch No change\n", + "[+] 0.5s\n", + "\u001b[2K\u001b[1A\u001b[2K\u001b[0G\u001b[?25h\n", + "Pinned packages:\n", + " - python 3.10.*\n", + "\n", + "\n", + "Transaction\n", + "\n", + " Prefix: /home/jackyko/mambaforge/envs/bioimage_ana_notebooks\n", + "\n", + " Updating specs:\n", + "\n", + " - nvidia/label/cuda-11.8.0::cuda-toolkit\n", + " - ca-certificates\n", + " - openssl\n", + "\n", + "\n", + " Package Version Build Channel Size\n", + "─────────────────────────────────────────────────────────────────────\n", + " Reinstall:\n", + "─────────────────────────────────────────────────────────────────────\n", + "\n", + " \u001b[32mo cuda-toolkit\u001b[0m 11.8.0 0 nvidia/label/cuda-11.8.0 \n", + "\n", + " Summary:\n", + "\n", + " Reinstall: 0 packages\n", + "\n", + " Total download: 0 B\n", + "\n", + "─────────────────────────────────────────────────────────────────────\n", + "\n", + "\n", + "\u001b[?25l\u001b[2K\u001b[0G\u001b[?25h" + ] + } + ], + "source": [ + "!mamba install nvidia/label/cuda-11.8.0::cuda-toolkit -y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## TensorFlow\n", + "By default Tensorflow has enabled to run with Tensor Cores whenever possible with GPU [compute capability >= 7.0](https://developer.nvidia.com/cuda-gpus). For GPU memory saving and computational speed you may manually switch to [mixed precision](https://keras.io/api/mixed_precision/)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Install TensorFlow with GPU support" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[33mWARNING: Ignoring invalid distribution -riton (/home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages)\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -riton (/home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages)\u001b[0m\u001b[33m\n", + "\u001b[0mLooking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n", + "Requirement already satisfied: tensorflow[and-cuda]==2.14.* in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (2.14.1)\n", + "Requirement already satisfied: termcolor>=1.1.0 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (2.4.0)\n", + "Requirement already satisfied: six>=1.12.0 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (1.16.0)\n", + "Requirement already satisfied: google-pasta>=0.1.1 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (0.2.0)\n", + "Requirement already satisfied: absl-py>=1.0.0 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (2.1.0)\n", + "Requirement already satisfied: numpy<2.0.0,>=1.23.5 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (1.23.5)\n", + "Requirement already satisfied: typing-extensions>=3.6.6 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (4.5.0)\n", + "Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (0.4.0)\n", + "Requirement already satisfied: tensorboard<2.15,>=2.14 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (2.14.1)\n", + "Requirement already satisfied: astunparse>=1.6.0 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (1.6.3)\n", + "Requirement already satisfied: tensorflow-estimator<2.15,>=2.14.0 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (2.14.0)\n", + "Requirement already satisfied: packaging in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (24.1)\n", + "Requirement already satisfied: ml-dtypes==0.2.0 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (0.2.0)\n", + "Requirement already satisfied: grpcio<2.0,>=1.24.3 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (1.64.1)\n", + "Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (4.25.3)\n", + "Requirement already satisfied: h5py>=2.9.0 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (3.11.0)\n", + "Requirement already satisfied: libclang>=13.0.0 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (18.1.1)\n", + "Requirement already satisfied: keras<2.15,>=2.14.0 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (2.14.0)\n", + "Requirement already satisfied: wrapt<1.15,>=1.11.0 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (1.14.1)\n", + "Requirement already satisfied: setuptools in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (65.5.1)\n", + "Requirement already satisfied: flatbuffers>=23.5.26 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (24.3.25)\n", + "Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (0.37.1)\n", + "Requirement already satisfied: opt-einsum>=2.3.2 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (3.3.0)\n", + "Requirement already satisfied: nvidia-cufft-cu11==10.9.0.58 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (10.9.0.58)\n", + "Requirement already satisfied: nvidia-cusolver-cu11==11.4.1.48 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (11.4.1.48)\n", + "Requirement already satisfied: nvidia-cusparse-cu11==11.7.5.86 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (11.7.5.86)\n", + "Requirement already satisfied: nvidia-nccl-cu11==2.16.5 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (2.16.5)\n", + "Requirement already satisfied: nvidia-cuda-cupti-cu11==11.8.87 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (11.8.87)\n", + "Requirement already satisfied: tensorrt==8.5.3.1 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (8.5.3.1)\n", + "Requirement already satisfied: nvidia-cudnn-cu11==8.7.0.84 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (8.7.0.84)\n", + "Requirement already satisfied: nvidia-curand-cu11==10.3.0.86 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (10.3.0.86)\n", + "Requirement already satisfied: nvidia-cublas-cu11==11.11.3.6 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (11.11.3.6)\n", + "Requirement already satisfied: nvidia-cuda-nvcc-cu11==11.8.89 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (11.8.89)\n", + "Requirement already satisfied: nvidia-cuda-runtime-cu11==11.8.89 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorflow[and-cuda]==2.14.*) (11.8.89)\n", + "Requirement already satisfied: wheel<1.0,>=0.23.0 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from astunparse>=1.6.0->tensorflow[and-cuda]==2.14.*) (0.38.4)\n", + "Requirement already satisfied: werkzeug>=1.0.1 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorboard<2.15,>=2.14->tensorflow[and-cuda]==2.14.*) (3.0.3)\n", + "Requirement already satisfied: google-auth<3,>=1.6.3 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorboard<2.15,>=2.14->tensorflow[and-cuda]==2.14.*) (2.31.0)\n", + "Requirement already satisfied: markdown>=2.6.8 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorboard<2.15,>=2.14->tensorflow[and-cuda]==2.14.*) (3.6)\n", + "Requirement already satisfied: requests<3,>=2.21.0 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorboard<2.15,>=2.14->tensorflow[and-cuda]==2.14.*) (2.32.3)\n", + "Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorboard<2.15,>=2.14->tensorflow[and-cuda]==2.14.*) (0.7.2)\n", + "Requirement already satisfied: google-auth-oauthlib<1.1,>=0.5 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from tensorboard<2.15,>=2.14->tensorflow[and-cuda]==2.14.*) (1.0.0)\n", + "Requirement already satisfied: cachetools<6.0,>=2.0.0 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow[and-cuda]==2.14.*) (5.3.3)\n", + "Requirement already satisfied: pyasn1-modules>=0.2.1 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow[and-cuda]==2.14.*) (0.4.0)\n", + "Requirement already satisfied: rsa<5,>=3.1.4 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow[and-cuda]==2.14.*) (4.9)\n", + "Requirement already satisfied: requests-oauthlib>=0.7.0 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from google-auth-oauthlib<1.1,>=0.5->tensorboard<2.15,>=2.14->tensorflow[and-cuda]==2.14.*) (2.0.0)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from requests<3,>=2.21.0->tensorboard<2.15,>=2.14->tensorflow[and-cuda]==2.14.*) (3.3.2)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from requests<3,>=2.21.0->tensorboard<2.15,>=2.14->tensorflow[and-cuda]==2.14.*) (2024.7.4)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from requests<3,>=2.21.0->tensorboard<2.15,>=2.14->tensorflow[and-cuda]==2.14.*) (2.2.2)\n", + "Requirement already satisfied: idna<4,>=2.5 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from requests<3,>=2.21.0->tensorboard<2.15,>=2.14->tensorflow[and-cuda]==2.14.*) (3.7)\n", + "Requirement already satisfied: MarkupSafe>=2.1.1 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from werkzeug>=1.0.1->tensorboard<2.15,>=2.14->tensorflow[and-cuda]==2.14.*) (2.1.5)\n", + "Requirement already satisfied: pyasn1<0.7.0,>=0.4.6 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow[and-cuda]==2.14.*) (0.6.0)\n", + "Requirement already satisfied: oauthlib>=3.0.0 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<1.1,>=0.5->tensorboard<2.15,>=2.14->tensorflow[and-cuda]==2.14.*) (3.2.2)\n", + "\u001b[33mWARNING: Ignoring invalid distribution -riton (/home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages)\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -riton (/home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages)\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -riton (/home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages)\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -riton (/home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages)\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -riton (/home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages)\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -riton (/home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages)\u001b[0m\u001b[33m\n", + "\u001b[0mLooking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n", + "Requirement already satisfied: nvidia-cudnn-cu11==8.7.0.84 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (8.7.0.84)\n", + "Requirement already satisfied: nvidia-cublas-cu11 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from nvidia-cudnn-cu11==8.7.0.84) (11.11.3.6)\n", + "\u001b[33mWARNING: Ignoring invalid distribution -riton (/home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages)\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -riton (/home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages)\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -riton (/home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages)\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -riton (/home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages)\u001b[0m\u001b[33m\n", + "\u001b[0m" + ] + } + ], + "source": [ + "# for Tensorflow and pyTorch compatibility we need to pin the library version\n", + "# this step may take some time to download\n", + "!pip install tensorflow[and-cuda]==2.14.*\n", + "!pip install nvidia-cudnn-cu11==8.7.0.84" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Fix Environment Path for CUDA Library\n", + "If you have problem in loading CUDNN from conda environment, you may need to run the following from terminal to properly set the path environments.\n", + "\n", + "❗The conda environment file automatically choose the compatible version between Tensorflow and pyTorch under same CUDA (11.8) and CUDNN (8.7) settings. If you find CUDA or CUDNN version inconsistency by faulty loading the machine base CUDA libraries, use the following Conda virtual environment setting to override the system-wide paths:\n", + "\n", + "```bash\n", + "mkdir -p $CONDA_PREFIX/etc/conda/activate.d\n", + "echo 'CUDNN_PATH=$(dirname $(python -c \"import nvidia.cudnn;print(nvidia.cudnn.__file__)\"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh\n", + "echo 'export LD_LIBRARY_PATH=$CONDA_PREFIX/lib/:$CUDNN_PATH/lib:$LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh\n", + "source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh\n", + "```\n", + "\n", + "Restart the Juypter kernel at this point, and continue from the cells below." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Training TF AI with FP32 precision" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import tensorflow as tf" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 1/5\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2024-07-09 09:36:52.407937: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f28b001f660 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:\n", + "2024-07-09 09:36:52.407961: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Quadro RTX 4000, Compute Capability 7.5\n", + "2024-07-09 09:36:52.412426: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.\n", + "2024-07-09 09:36:52.547271: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1875/1875 [==============================] - 10s 5ms/step - loss: 0.2625 - accuracy: 0.9244 - val_loss: 0.1476 - val_accuracy: 0.9564\n", + "Epoch 2/5\n", + "1875/1875 [==============================] - 9s 5ms/step - loss: 0.1167 - accuracy: 0.9661 - val_loss: 0.1020 - val_accuracy: 0.9681\n", + "Epoch 3/5\n", + "1875/1875 [==============================] - 9s 5ms/step - loss: 0.0791 - accuracy: 0.9765 - val_loss: 0.0882 - val_accuracy: 0.9737\n", + "Epoch 4/5\n", + "1875/1875 [==============================] - 8s 5ms/step - loss: 0.0598 - accuracy: 0.9814 - val_loss: 0.0875 - val_accuracy: 0.9729\n", + "Epoch 5/5\n", + "1875/1875 [==============================] - 8s 5ms/step - loss: 0.0450 - accuracy: 0.9857 - val_loss: 0.0731 - val_accuracy: 0.9782\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Define a simple model\n", + "model = tf.keras.Sequential([\n", + " tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),\n", + " tf.keras.layers.Dense(10)\n", + "])\n", + "\n", + "# Compile the model with the optimizer and loss function\n", + "model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),\n", + " loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", + " metrics=['accuracy'])\n", + "\n", + "# Example dataset with MNIST\n", + "(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()\n", + "# use float32 will then take full computational precision on Tensor Cores\n", + "train_images = train_images.reshape(-1, 784).astype('float32') / 255\n", + "test_images = test_images.reshape(-1, 784).astype('float32') / 255\n", + "\n", + "# Train the model\n", + "model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Tensorflow also allow environmental variable control of Tensor Core. Check the setting from official documentation: https://docs.nvidia.com/deeplearning/frameworks/tensorflow-user-guide/index.html#tf_disable_tensor_op_math" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## PyTorch\n", + "\n", + "Similar to Tensorflow, Tensor Core mixed precision is called by [AMP (Automatic Mixed Precision)](https://pytorch.org/docs/stable/amp.html) in pyTorch." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Install PyTorch with GPU support" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[33mWARNING: Ignoring invalid distribution -riton (/home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages)\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -riton (/home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages)\u001b[0m\u001b[33m\n", + "\u001b[0mLooking in indexes: https://download.pytorch.org/whl/cu118, https://pypi.ngc.nvidia.com\n", + "Requirement already satisfied: torch in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (2.3.1+cu118)\n", + "Collecting torchvision\n", + " Downloading https://download.pytorch.org/whl/cu118/torchvision-0.18.1%2Bcu118-cp310-cp310-linux_x86_64.whl (6.3 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m6.3/6.3 MB\u001b[0m \u001b[31m36.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\n", + "\u001b[?25hRequirement already satisfied: networkx in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from torch) (3.3)\n", + "Requirement already satisfied: nvidia-curand-cu11==10.3.0.86 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from torch) (10.3.0.86)\n", + "Requirement already satisfied: nvidia-cublas-cu11==11.11.3.6 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from torch) (11.11.3.6)\n", + "Requirement already satisfied: nvidia-cuda-runtime-cu11==11.8.89 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from torch) (11.8.89)\n", + "Requirement already satisfied: nvidia-cuda-cupti-cu11==11.8.87 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from torch) (11.8.87)\n", + "Requirement already satisfied: nvidia-cusparse-cu11==11.7.5.86 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from torch) (11.7.5.86)\n", + "Requirement already satisfied: nvidia-nvtx-cu11==11.8.86 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from torch) (11.8.86)\n", + "Requirement already satisfied: typing-extensions>=4.8.0 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from torch) (4.9.0)\n", + "Requirement already satisfied: nvidia-cudnn-cu11==8.7.0.84 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from torch) (8.7.0.84)\n", + "Requirement already satisfied: sympy in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from torch) (1.12)\n", + "Requirement already satisfied: jinja2 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from torch) (3.1.3)\n", + "Requirement already satisfied: fsspec in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from torch) (2024.2.0)\n", + "Requirement already satisfied: nvidia-nccl-cu11==2.20.5 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from torch) (2.20.5)\n", + "Requirement already satisfied: nvidia-cufft-cu11==10.9.0.58 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from torch) (10.9.0.58)\n", + "Requirement already satisfied: nvidia-cusolver-cu11==11.4.1.48 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from torch) (11.4.1.48)\n", + "Requirement already satisfied: triton==2.3.1 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from torch) (2.3.1)\n", + "Requirement already satisfied: filelock in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from torch) (3.13.1)\n", + "Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.8.89 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from torch) (11.8.89)\n", + "Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from torchvision) (10.4.0)\n", + "Requirement already satisfied: numpy in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from torchvision) (1.23.5)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from jinja2->torch) (2.1.5)\n", + "Requirement already satisfied: mpmath>=0.19 in /home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages (from sympy->torch) (1.3.0)\n", + "\u001b[33mWARNING: Ignoring invalid distribution -riton (/home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages)\u001b[0m\u001b[33m\n", + "\u001b[0mInstalling collected packages: torchvision\n", + "\u001b[33mWARNING: Ignoring invalid distribution -riton (/home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages)\u001b[0m\u001b[33m\n", + "\u001b[0mSuccessfully installed torchvision-0.18.1+cu118\n", + "\u001b[33mWARNING: Ignoring invalid distribution -riton (/home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages)\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -riton (/home/jackyko/mambaforge/envs/bioimage_ana_notebooks/lib/python3.10/site-packages)\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: There was an error checking the latest version of pip.\u001b[0m\u001b[33m\n", + "\u001b[0m" + ] + } + ], + "source": [ + "# keep lower version of CUDA and pytorch for environment consistency to tensorflow\n", + "!pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "# header import\n", + "import torch\n", + "import torch.nn as nn\n", + "import torch.optim as optim\n", + "from torch.cuda.amp import GradScaler, autocast\n", + "from torchvision import datasets, transforms" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define a simple neural network" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "class SimpleNet(nn.Module):\n", + " def __init__(self):\n", + " super(SimpleNet, self).__init__()\n", + " self.fc1 = nn.Linear(784, 128)\n", + " self.relu = nn.ReLU()\n", + " self.fc2 = nn.Linear(128, 10)\n", + "\n", + " def forward(self, x):\n", + " x = x.view(-1, 784)\n", + " x = self.fc1(x)\n", + " x = self.relu(x)\n", + " x = self.fc2(x)\n", + " return x" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "# Hyperparameters\n", + "batch_size = 64\n", + "learning_rate = 0.01\n", + "epochs = 3" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Download training data and build pyTorch loader" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz\n", + "Failed to download (trying next):\n", + "HTTP Error 403: Forbidden\n", + "\n", + "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz\n", + "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ../../data/MNIST/raw/train-images-idx3-ubyte.gz\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100.0%\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Extracting ../../data/MNIST/raw/train-images-idx3-ubyte.gz to ../../data/MNIST/raw\n", + "\n", + "Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz\n", + "Failed to download (trying next):\n", + "HTTP Error 403: Forbidden\n", + "\n", + "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz\n", + "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ../../data/MNIST/raw/train-labels-idx1-ubyte.gz\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100.0%\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Extracting ../../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../../data/MNIST/raw\n", + "\n", + "Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz\n", + "Failed to download (trying next):\n", + "HTTP Error 403: Forbidden\n", + "\n", + "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz\n", + "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ../../data/MNIST/raw/t10k-images-idx3-ubyte.gz\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100.0%\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Extracting ../../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../../data/MNIST/raw\n", + "\n", + "Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz\n", + "Failed to download (trying next):\n", + "HTTP Error 403: Forbidden\n", + "\n", + "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz\n", + "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ../../data/MNIST/raw/t10k-labels-idx1-ubyte.gz\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100.0%" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Extracting ../../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../../data/MNIST/raw\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "# Data loaders\n", + "transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])\n", + "train_dataset = datasets.MNIST(root='../../data', train=True, transform=transform, download=True)\n", + "train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Setup NN model, loss function, and optimizer" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", + "model = SimpleNet().to(device)\n", + "criterion = nn.CrossEntropyLoss()\n", + "optimizer = optim.SGD(model.parameters(), lr=learning_rate)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Enable Automatic Mixed Precision (AMP)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "# AMP scaler\n", + "scaler = GradScaler()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Training loop" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch [1/3], Step [100/938], Loss: 1.8893\n", + "Epoch [1/3], Step [200/938], Loss: 1.1836\n", + "Epoch [1/3], Step [300/938], Loss: 0.7898\n", + "Epoch [1/3], Step [400/938], Loss: 0.6417\n", + "Epoch [1/3], Step [500/938], Loss: 0.5598\n", + "Epoch [1/3], Step [600/938], Loss: 0.4882\n", + "Epoch [1/3], Step [700/938], Loss: 0.4550\n", + "Epoch [1/3], Step [800/938], Loss: 0.4444\n", + "Epoch [1/3], Step [900/938], Loss: 0.4150\n", + "Epoch [2/3], Step [100/938], Loss: 0.4049\n", + "Epoch [2/3], Step [200/938], Loss: 0.3767\n", + "Epoch [2/3], Step [300/938], Loss: 0.3811\n", + "Epoch [2/3], Step [400/938], Loss: 0.3715\n", + "Epoch [2/3], Step [500/938], Loss: 0.3740\n", + "Epoch [2/3], Step [600/938], Loss: 0.3542\n", + "Epoch [2/3], Step [700/938], Loss: 0.3510\n", + "Epoch [2/3], Step [800/938], Loss: 0.3506\n", + "Epoch [2/3], Step [900/938], Loss: 0.3370\n", + "Epoch [3/3], Step [100/938], Loss: 0.3311\n", + "Epoch [3/3], Step [200/938], Loss: 0.3317\n", + "Epoch [3/3], Step [300/938], Loss: 0.3367\n", + "Epoch [3/3], Step [400/938], Loss: 0.3295\n", + "Epoch [3/3], Step [500/938], Loss: 0.3246\n", + "Epoch [3/3], Step [600/938], Loss: 0.2980\n", + "Epoch [3/3], Step [700/938], Loss: 0.3147\n", + "Epoch [3/3], Step [800/938], Loss: 0.3101\n", + "Epoch [3/3], Step [900/938], Loss: 0.3036\n", + "Finished Training\n" + ] + } + ], + "source": [ + "for epoch in range(epochs):\n", + " model.train()\n", + " running_loss = 0.0\n", + " for i, (images, labels) in enumerate(train_loader):\n", + " images, labels = images.to(device), labels.to(device)\n", + "\n", + " optimizer.zero_grad()\n", + "\n", + " # Forward pass with autocast\n", + " with autocast():\n", + " outputs = model(images)\n", + " loss = criterion(outputs, labels)\n", + "\n", + " # Backward pass and optimization with scaler\n", + " scaler.scale(loss).backward()\n", + " scaler.step(optimizer)\n", + " scaler.update()\n", + "\n", + " running_loss += loss.item()\n", + " if (i + 1) % 100 == 0:\n", + " print(f\"Epoch [{epoch+1}/{epochs}], Step [{i+1}/{len(train_loader)}], Loss: {running_loss / 100:.4f}\")\n", + " running_loss = 0.0\n", + "\n", + "print(\"Finished Training\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Additional Tips\n", + "- **Performance Monitoring**: Use NVIDIA’s `nvprof`, `nsight`, or `nvidia-smi` tools to monitor GPU usage and ensure Tensor Cores are being utilised." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Mon Jul 8 23:34:04 2024 \n", + "+-----------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 525.60.11 Driver Version: 525.60.11 CUDA Version: 12.0 |\n", + "|-------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|===============================+======================+======================|\n", + "| 0 Quadro RTX 4000 Off | 00000000:65:00.0 Off | N/A |\n", + "| 30% 32C P8 5W / 125W | 360MiB / 8192MiB | 0% Default |\n", + "| | | N/A |\n", + "+-------------------------------+----------------------+----------------------+\n", + " \n", + "+-----------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=============================================================================|\n", + "| 0 N/A N/A 2655 G /usr/lib/xorg/Xorg 108MiB |\n", + "| 0 N/A N/A 2697 G /usr/bin/sddm-greeter 55MiB |\n", + "+-----------------------------------------------------------------------------+\n" + ] + } + ], + "source": [ + "!nvidia-smi" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- **Profile Your Code**: Use TensorFlow's or PyTorch's built-in profilers to understand where your model spends most of its time and ensure mixed precision is being applied correctly.\n", + "\n", + "By following these steps, you should be able to take advantage of Tensor Cores in both TensorFlow and PyTorch, significantly accelerating the training process for deep learning models." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### FAQ\n", + "\n", + "1. **It looks no difference from normal TF/pyTorch code, why you need to specially mention?**\n", + " By default all TF/pyTorch are run under FP32 (full precision) mode. In cases that computational power is limited (e.g. insufficient GPU memory or limited training time), you can swap to mixed-precision model and change to non-default AI precisions.\n", + "\n", + "2. **Is FP32 always necessary for best AI model?**\n", + "\n", + " No, there are research showing AI training can significantly speed up on low-end GPUs with very similar performance.\n", + "\n", + "3. **So why training precision is mentioned in section of Tensor Core?**\n", + "\n", + " In fact different precision Tensor Cores are physically computing unit within the GPU. In older GPU models and deep learning packages if you choose full precision mode there may be a chance to automatically fall into mixed-precision mode. This is caused by the limited amount of high precision Tensor Cores in older/low-end GPUs.\n", + " \n", + " Once we upgraded from V100 to A100 GPUs the model no longer retains the original performance until we explicitly set to use FP32 mode back to CUDA cores. This phenomenon is known as [precision loss problem](https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html).\n", + "\n", + "4. Can you tell me when is the best time to use non-FP32 precision?\n", + " Considering developing smart microscopy, i.e. performing bioimage analysis simultaneously with image acquisition, computation may often offload to weaker GPUs. Together to catch up with image acquisition speed, AI float point precision is one possible factor to sacrifice to boost up automated bioimage anlysis.\n", + "\n", + "3. **Can I simply use newer version of TF/pyTorch to solve the problem?**\n", + "\n", + " - Yes if you are training new model from scratch.\n", + " - However there are older AI models that is very version specific. Or in the case to fit both TF and pyTorch under the same environment, you will have to stay in older version of the package. One know example is running [UNet](https://github.com/lmb-freiburg/Unet-Segmentation?tab=readme-ov-file) with pretrained cell segmentation model. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Further Reading\n", + "- [Understand Tensor Cores](https://blog.paperspace.com/understanding-tensor-cores/)\n", + "- [Nvidia Train with Mixed Precision](https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html)\n", + "- [Keras Mixed Precision](https://keras.io/api/mixed_precision/)\n", + "- [pyTorch Automaic Mixed Precision](https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "bioimage_ana_notebooks", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/docs/80_hardware/SoC.png b/docs/80_hardware/SoC.png new file mode 100644 index 000000000..fea1b7907 Binary files /dev/null and b/docs/80_hardware/SoC.png differ diff --git a/docs/80_hardware/arch.md b/docs/80_hardware/arch.md new file mode 100644 index 000000000..befd90662 --- /dev/null +++ b/docs/80_hardware/arch.md @@ -0,0 +1,18 @@ +# Processor Chipset Architecture +Modern days computer processors often integrate various computing units including CPU, GPU, NPU and RAM. These components often cause significant performance difference in bioimage analysis. + +The processors may still classified by the CPUs instruction sets, mainly x86 and ARM. Python libraries natively built on one of the architecture may not be directly runnable on the other, unless with OS layer translation or code compilation from source. i.e. Legacy x86 Python libraries may not be runnable on ARM computers. For power performance reason we see computer manufacturers are releasing new laptop in ARM architecture, yet most of the existing bioimage analysis software are pre-compiled in x86. With the effort of Apple Rosetta 2, the issue is more relieved yet not 100% compatible. So bare in mind in choosing the adequate CPU for your analysis work. + +When necessary, consult the code developer for the support to the CPU platforms. Following is a summary for the CPUs architectures: + +| Feature | Apple Silicon | Intel | AMD | Qualcomm | NVIDIA | +|----------------------------------|--------------------|---------------------|-------------------|--------------------|--------------------| +| **Architecture** | ARM-based | x86/x86-64 | x86/x86-64 | ARM-based | ARM-based (Grace CPU) | +| **Notable Series** | M1, M2 | Core, Xeon | Ryzen, EPYC | Snapdragon X Elite | Grace CPU | +| **Big-Small Cores** | Yes | Yes | Not typical | Yes | Not typical | +| **Integrated Graphics** | Apple GPU | Intel Iris, UHD, Arc | Radeon Graphics | Adreno GPU | NVIDIA GPU | +| **Thermal Design Power (TDP)** | Low to moderate | Moderate to high | Moderate to high | Low | Moderate to high | +| **Primary Use Cases** | Laptops, Desktops | Laptops, Desktops, Servers | Laptops, Desktops, Servers | Laptops, Mobile Devices | HPC, AI, Data Centers | +| **OS** | macOS | Windows, Linux | Windows, Linux | Windows, Linux | Linux | +| **OpenCL Support** | Yes | Yes | Yes | Not Mentioned | Yes | +| **AI Support** | Yes | Yes | Yes | Yes | Best compatibility with CUDA | \ No newline at end of file diff --git a/docs/80_hardware/gpu.md b/docs/80_hardware/gpu.md new file mode 100644 index 000000000..94d21ca64 --- /dev/null +++ b/docs/80_hardware/gpu.md @@ -0,0 +1,23 @@ +# GPU Support +## AI Training +Though all processor manufacturers embed GPU in the chipset, the AI based analysis is largely relying on NVidia CUDA as the base software stack. Common neural network libraries in Python (pyTorch and Tensorflow) are the foundation stone of popular models like UNet, Cellpose and Stardist. Yet we are seeing a recent support to pyTorch AMD ROCm and Intel OneAPI AI acceleration, the community support is fairly limited when comparing to CUDA. Considering the training scalability and infrastructure support across major GPU farms/research clusters, NVidia is still the sole runner when consider new model training. + +## AI Inference +Machine learning algorithms consists of two parts: model training and inference. The computation resources for a fixed AI model to be implemented in new data are much smaller than training from scratch. On smaller AI tasks non-CUDA chipsets bring larger options for bioimage analysis. The inference of neural network based AI can be physically accelerated with specifically designed circuits. Such designs are often referred as neural processing units (NPU). NVidia, specifically added Tensor Core in bundle with optimised packages like cuDNN and Transformer Engine, to their later GPU products. Quick guide for Tensor Core based acceleration in available in [here](../20b_deep_learning/tensor_core.ipynb). + +
+ Placeholder Image +

General Matrix Multiplication (GEMM) as the fundamental building block of neural network (NN) operations. The math basis of NNs and image manipulation are similar embarrassingly parallel tasks involving matrices, leading GPU widely used in many machine learning tasks.

+
+ +## GPGPU Acceleration +Apart from AI applications, bioimage analysis tasks like single plane illumination fluorescent correlation spectroscopy (SPIM-FCS) performs [pixelwise fitting of the autocorrelation function](https://github.com/bpi-oxford/Gpufit/blob/master/Gpufit/models/spim_acfN.cuh). Such image analysis can utilise the parallelisation power of GPU to accelerate the research. + +One high level analysis package [py-clesperanto](https://github.com/clEsperanto/pyclesperanto_prototype) attempts GPU acceleration based on OpenCL. Such computing process allows bioimage analysis not bound to graphic processing, but to more generic calculations. From this the GPU is often referred as general purpose GPU (GPGPU). Vendors like AMD and Intel are alternatives to NVidia in this sense. + +| | **NVIDIA** | **AMD** | **Intel** | **Apple** | +|---------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------| +| **OpenCL Support** | Strong support but secondary to CUDA. Performance is robust. | Major proponent with excellent support and performance. | Improved support, especially with newer architectures. Performance historically lagged but is improving. | Limited support, focus shifting to Metal. | +| **OpenGL Support** | Excellent performance and highly optimized drivers. | Strong support, competitive performance, though historically lagged in specific cases. | Solid support, especially for integrated graphics. Significant improvements with newer discrete GPUs. | Historically good, but deprecated in favor of Metal. | +| **Vendor-Specific Toolkit** | **CUDA**: Most mature and widely-used. Extensive libraries and tools. Highly optimized hardware. | **ROCm**: Comprehensive platform, open-source, flexible. Primarily supports newer GPUs. | **oneAPI**: Unified model supporting multiple architectures. Provides DPC++, integrates OpenCL. Relatively new, still gaining traction. | **Metal**: Modern API designed specifically for Apple hardware. Provides low-level access to GPU features. | +| **Performance** | Typically leads in raw performance, especially for CUDA-based applications. Highly optimized. | Excels in OpenCL applications. High memory bandwidth and compute capabilities. | Emerging player with promising solutions. Strong integration with CPUs. | Excellent performance on Apple hardware, particularly optimized for Apple Silicon (M1, M2, etc.). | \ No newline at end of file diff --git a/docs/80_hardware/npu.md b/docs/80_hardware/npu.md new file mode 100644 index 000000000..96fd880fa --- /dev/null +++ b/docs/80_hardware/npu.md @@ -0,0 +1,26 @@ + +# Neural Processing Unit (NPU) +
+ Placeholder Image +

Schematic depiction of the outer matrix product AB of two matrices A and B. NPUs implement general matrix multiplications (GEMMs) with physical wired circuits to accelerate AI calculations.

+
+ +A Neural Processing Unit (NPU) is a specialized hardware accelerator designed to efficiently handle the computational demands of AI and machine learning tasks, particularly neural network inference and training. NPUs are optimized for the deep learning based bioimage analysis, such as Stardist, Cellpose, PlantSeg and CSBDeep CARE denoising. + +In mid-2024 the NPUs are embedded in various laptops, allowing a wider choice in AI applications. + +| Feature | Google TPU (USB/M.2) | Apple Silicon | AMD | Intel (after Meteor Lake) | NVIDIA (Grace Hopper) | NVIDIA (Jetson) | Qualcomm Snapdragon X Elite | +|----------------------------------|---------------------------|--------------------|--------------------------|---------------------------|---------------------------|---------------------------|---------------------------| +| **Product Name** | Edge TPU | Apple Neural Engine| 3rd Gen Ryzen AI| VPU, GNA, AI Engine | TensorRT, DLA, Grace Hopper| Jetson Xavier, Nano, TX2 | Qualcomm AI Engine | +| **Primary Use Case** | Low Power Devices AI| Mobile, Desktop | GPUs with AI Capabilities, large data inference| Mobile, Desktop | Data Center, HPC, Embedded | Embedded AI | Laptop | +| **Performance** | Moderate | High | Moderate to High | Moderate to High | Very High | Moderate to High | Moderate | +| **Efficiency** | High | High | Moderate | High | Moderate to High | High | High | +| **Special Features** | Google Cloud Compatible, Tensor Operations| Unified Memory, Tight OS Integration | APUs, ROCm | Low Power, Vision Processing, Integrated AI | CUDA Integration, Tensor Cores | Low Power, Integrated AI | ARM based Windows Laptop | +| **Flexibility** | Specialized for TensorFlow| General Purpose | AI with General Compute | Specialized for AI and Vision| Highly Specialized | General Purpose | General Purpose | +| **Compatibility** | TensorFlow Lite | macOS | Windows, Linux | Windows, Linux | Windows, Linux | Linux | Windows, Linux | +| **Scalability** | High | Moderate | Moderate | Moderate | High | Moderate | Moderate | +| **Availability** | USB, M.2 Modules | Built-in (M-series)| Radeon Instinct GPUs, APUs | Integrated in Meteor Lake CPUs | Available in GPUs, Servers | Available in Embedded Modules | Snapdragon System on Chips (SoCs) | + +## External Reading: +- [Get started with tensorflow-metal (AI on Apple Neural Engine)](https://developer.apple.com/metal/tensorflow-plugin/) +- [PluggableDevice: Device Plugin for Tensorflow](https://blog.tensorflow.org/2021/06/pluggabledevice-device-plugins-for-TensorFlow.html) \ No newline at end of file diff --git a/docs/80_hardware/npu.png b/docs/80_hardware/npu.png new file mode 100644 index 000000000..8f34fe684 Binary files /dev/null and b/docs/80_hardware/npu.png differ diff --git a/docs/80_hardware/npu_2.png b/docs/80_hardware/npu_2.png new file mode 100644 index 000000000..ad59c831d Binary files /dev/null and b/docs/80_hardware/npu_2.png differ diff --git a/docs/80_hardware/readme.md b/docs/80_hardware/readme.md new file mode 100644 index 000000000..bd4d1e982 --- /dev/null +++ b/docs/80_hardware/readme.md @@ -0,0 +1,45 @@ +# Choosing the Optimal Computer + +Though Python is runnable on most of modern operating systems (OS) including Windows, MacOS and Linux, it is beneficial to keep scripting under *nix environment. To understand difference of OS shell environment check the [page](../03_advanced_python/shell_scripting.md). Here we provide a guide for beginners to choose your computing hardwares. + +This guide is intentionally written for programming beginners to code locally. + +## General Guide + +When choosing a computer for bioimage analysis, it's essential to consider hardware performance, memory size, OS, portability, and application scenarios. Here’s a summary comparing different computing modalities: + +| Feature | Laptops | Desktops | Workstations | Servers | +|---------------------------|------------------------------------------|------------------------------------------|------------------------------------------|------------------------------------------| +| **Hardware Performance** | Mid-range to high-end CPUs and GPUs | High-end CPUs and GPUs | Top-tier CPUs and multiple GPUs | Multiple high-end CPUs and GPUs | +| **Memory Size** | Up to 64GB (most have 16GB-32GB) | Up to 128GB or more | 128GB to 512GB or more | Terabytes of RAM | +| **GPU** | Integrated or dedicated GPUs | High-end dedicated GPUs (e.g., NVIDIA RTX) | Professional GPUs (e.g., NVIDIA Quadro/RTX A-series) | Multiple professional GPUs (e.g., NVIDIA Tesla/Quadro) | +| **NPU** | SoC Dependent | Limited NPU support | Available in some high-end models | Available, especially in AI-optimized servers | +| **OS** | Windows, macOS, Linux | Windows, macOS, Linux | Windows, macOS, Linux | Linux, Windows Server | +| **Portability** | Highly portable | Not portable | Not portable | Not portable | +| **Application Scenarios** | Mobile work, basic to moderate tasks | Stationary use, moderate to intensive tasks | Intensive tasks, advanced analysis | Large-scale projects, remote access, collaborative research | +| **ARM vs x86** | Mostly x86 (some ARM options like Apple Silicon and Snapdragon XLite) | Mostly x86 except for Apple | Mostly x86 except for Apple | Mostly x86 (ARM servers available, e.g., AWS Graviton) | +| **ARM Performance** | Energy-efficient, good for battery life | Limited use, lower performance than x86, suitable for task specific AI like smart microscopy | Rare, used in specific scenarios | High efficiency, used in cloud services | +| **x86 Performance** | High performance, widely supported. Deprecating in MacOS support. | Higher performance, widely supported | Highest performance, widely supported | Highest performance, widely supported | + + +## Key Considerations + +| **Criteria** | **Description** | +|---------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| **OS** | Choose the OS based on software compatibility and personal preference. Windows and Linux are common across all device types, with macOS being exclusive to laptops and desktops. More concern is about [OS terminal and shell scripting](../03_advanced_python/shell_scripting.md). | +| **Processor Architecture**| ARM processors are known for energy efficiency and are increasingly used in laptops (e.g., Apple M1/M2) and servers (e.g., AWS Graviton). x86 processors dominate in performance and are widely supported across all device types, making them the standard choice for high-performance bioimage analysis tasks. [For details check here](./arch.md). | +| **GPU** | Essential for handling complex image processing and analysis. Laptops typically have consumer-grade GPUs, while desktops and workstations offer higher-end consumer or professional-grade GPUs. For independent GPU on laptop, the power consumption is very high and limits portability and a sustainable coding environment. Servers can have multiple high-end GPUs optimized for parallel processing and large-scale computations. Considering the domination of CUDA in the AI domain, Nvidia is the only recommended vendor. [For details of GPU applications check here](./gpu.md). | +| **NPU** | Neural Processing Units are becoming more relevant for AI and machine learning tasks. The performance across vendors is yet to be benchmarked. [To understand more about NPUs read the section here](./npu.md). | + +## Should I take ARM chipset for Bioimage Analysis? +- **MacOS**: Yes. The community is growing fast and libraries are largely compatible. Performance efficiency is high. +- **Windows**: No. Lack of Windows miniforage ARM is sufficiently explained. +- **Linux**: Good for task oriented applications. Certain high performance 3D rendering may be limited. Good power performance for thin client coding. + +## Summary +- **Laptops**: Best for portability and moderate analysis tasks, with some ARM options for better energy efficiency. +- **Desktops**: Offer higher performance and memory capacity, suitable for stationary use with high-end GPU options. +- **Workstations**: Provide top-tier performance with advanced GPU and NPU options, ideal for demanding bioimage analysis tasks. +- **Servers**: Unmatched in performance and memory, perfect for large-scale, collaborative, and remote-access analysis tasks, with ARM options for energy-efficient cloud computing. + +Choose the appropriate device based on your specific needs, considering the balance between portability, performance, and the nature of your bioimage analysis tasks. \ No newline at end of file diff --git a/docs/_toc.yml b/docs/_toc.yml index 39485163f..b76847223 100644 --- a/docs/_toc.yml +++ b/docs/_toc.yml @@ -155,6 +155,7 @@ parts: sections: - file: 20b_deep_learning/cellpose - file: 20b_deep_learning/stardist + - file: 20b_deep_learning/tensor_core - file: 20c_vision_models/readme sections: @@ -277,6 +278,7 @@ parts: - file: 03_advanced_python/partial - file: 03_advanced_python/parallelization - file: 03_advanced_python/numba + - file: 03_advanced_python/shell_scripting - file: 15_gpu_acceleration/readme sections: @@ -284,7 +286,7 @@ parts: - file: 15_gpu_acceleration/why_GPU_acceleration - file: 15_gpu_acceleration/memory_management - file: 15_gpu_acceleration/further_reading - + - file: 31_graphical_user_interfaces/readme sections: - file: 31_graphical_user_interfaces/napari @@ -427,5 +429,10 @@ parts: - caption: Appendix chapters: - file: 01_introduction/glossary - - file: imprint + - file: 80_hardware/readme + sections: + - file: 15_gpu_acceleration/hardware + - file: 15_gpu_acceleration/arch + - file: 15_gpu_acceleration/gpu + - file: 15_gpu_acceleration/npu \ No newline at end of file