haesleinhuepf · jackyko1991 · Jul 5, 2024 · Jul 5, 2024 · Jul 5, 2024 · Jul 8, 2024
diff --git a/docs/03_advanced_python/shell_scripting.md b/docs/03_advanced_python/shell_scripting.md
@@ -0,0 +1,18 @@
+# Shell Scripting
+Shell scripting involves writing scripts for the command-line interpreter, known as the shell, on Unix-based systems and Windows. These scripts automate tasks such as file manipulation, program execution, and text processing. Common Unix shells include `Bash`, `Zsh`, and `Ksh`, while Windows uses Command Prompt (`cmd`) and PowerShell. 
+
+Shell scripts start with a shebang (`#!`) and extensions of `sh` on Unix and `.bat` or `.ps1` on Windows. Commands like `echo`, `grep`, and `awk` are common on Unix, while `Write-Output`, `Get-Content`, and `ForEach-Object` are used in PowerShell. This is very often used together with Python scripts and Jupyter notebook magics to automate batch analysis.
+
+```python
+# Running Python pip installation from Jupyter notebook with shell command
+%pip install matplotlib
+```
+
+For Windows users it is highly recommended to run Python alongside with Git bash (https://git-scm.com/downloads) that maximally mimic the running *nix running environment.
+
+## Summary Table on OS and Shell Scipts
+| Operating System | Terminal Emulator            | Default Shell    | Additional Shells                   | Pros                                                                         | Cons                                                               |
+|------------------|------------------------------|------------------|-------------------------------------|------------------------------------------------------------------------------|--------------------------------------------------------------------|
+| Linux            | GNOME Terminal, Konsole      | Bash             | Zsh, Fish, Ksh, Tcsh, Dash          | Highly customizable, vast array of tools, strong community support, open-source | Fragmentation in terminal emulators, varying default configurations|
+| macOS            | Terminal, iTerm2             | Zsh (since 10.15) | Bash, Fish, Ksh, Tcsh               | User-friendly, well-integrated with macOS, iTerm2 offers advanced features    | Terminal app is less feature-rich compared to iTerm2              |
+| Windows          | Command Prompt, PowerShell, Windows Terminal | PowerShell       | Bash (via WSL), Git Bash, Cygwin    | Powerful scripting capabilities in PowerShell, WSL brings Linux compatibility | Command Prompt is limited, PowerShell syntax can be complex       |
diff --git a/docs/20b_deep_learning/tensor_core.ipynb b/docs/20b_deep_learning/tensor_core.ipynb
diff --git a/docs/80_hardware/SoC.png b/docs/80_hardware/SoC.png
diff --git a/docs/80_hardware/arch.md b/docs/80_hardware/arch.md
@@ -0,0 +1,18 @@
+# Processor Chipset Architecture
+Modern days computer processors often integrate various computing units including CPU, GPU, NPU and RAM. These components often cause significant performance difference in bioimage analysis.
+
+The processors may still classified by the CPUs instruction sets, mainly x86 and ARM. Python libraries natively built on one of the architecture may not be directly runnable on the other, unless with OS layer translation or code compilation from source. i.e. Legacy x86 Python libraries may not be runnable on ARM computers. For power performance reason we see computer manufacturers are releasing new laptop in ARM architecture, yet most of the existing bioimage analysis software are pre-compiled in x86. With the effort of Apple Rosetta 2, the issue is more relieved yet not 100% compatible. So bare in mind in choosing the adequate CPU for your analysis work.
+
+When necessary, consult the code developer for the support to the CPU platforms. Following is a summary for the CPUs architectures:
+
+| Feature                          | Apple Silicon      | Intel               | AMD               | Qualcomm   | NVIDIA            |
+|----------------------------------|--------------------|---------------------|-------------------|--------------------|--------------------|
+| **Architecture**                 | ARM-based          | x86/x86-64          | x86/x86-64        | ARM-based          | ARM-based (Grace CPU) |
+| **Notable Series**               | M1, M2             | Core, Xeon          | Ryzen, EPYC       | Snapdragon X Elite     | Grace CPU          |
+| **Big-Small Cores**             | Yes    | Yes        | Not typical       | Yes    | Not typical        |
+| **Integrated Graphics**          | Apple GPU   | Intel Iris, UHD, Arc | Radeon Graphics | Adreno GPU   | NVIDIA GPU   |
+| **Thermal Design Power (TDP)**   | Low to moderate    | Moderate to high    | Moderate to high  | Low                | Moderate to high   |
+| **Primary Use Cases**            | Laptops, Desktops  | Laptops, Desktops, Servers | Laptops, Desktops, Servers | Laptops, Mobile Devices | HPC, AI, Data Centers |
+| **OS**                | macOS              | Windows, Linux | Windows, Linux   | Windows, Linux   | Linux              |
+| **OpenCL Support**                | Yes              | Yes | Yes   | Not Mentioned  | Yes              |
+| **AI Support**                | Yes              | Yes | Yes   | Yes  | Best compatibility with CUDA |
diff --git a/docs/80_hardware/gpu.md b/docs/80_hardware/gpu.md
@@ -0,0 +1,23 @@
+# GPU Support
+## AI Training
+Though all processor manufacturers embed GPU in the chipset, the AI based analysis is largely relying on NVidia CUDA as the base software stack. Common neural network libraries in Python (pyTorch and Tensorflow) are the foundation stone of popular models like UNet, Cellpose and Stardist. Yet we are seeing a recent support to pyTorch AMD ROCm and Intel OneAPI AI acceleration, the community support is fairly limited when comparing to CUDA. Considering the training scalability and infrastructure support across major GPU farms/research clusters, NVidia is still the sole runner when consider new model training.
+
+## AI Inference
+Machine learning algorithms consists of two parts: model training and inference. The computation resources for a fixed AI model to be implemented in new data are much smaller than training from scratch. On smaller AI tasks non-CUDA chipsets bring larger options for bioimage analysis. The inference of neural network based AI can be physically accelerated with specifically designed circuits. Such designs are often referred as neural processing units (NPU). NVidia, specifically added Tensor Core in bundle with optimised packages like cuDNN and Transformer Engine, to their later GPU products. Quick guide for Tensor Core based acceleration in available in [here](../20b_deep_learning/tensor_core.ipynb).
+
+<div style="text-align: center;">
+  <img src="./npu.png" alt="Placeholder Image" style="width:50%;">
+  <p><em>General Matrix Multiplication (GEMM) as the fundamental building block of neural network (NN) operations. The math basis of NNs and image manipulation are similar embarrassingly parallel tasks involving matrices, leading GPU widely used in many machine learning tasks.</em></p>
+</div>
+
+## GPGPU Acceleration
+Apart from AI applications, bioimage analysis tasks like single plane illumination fluorescent correlation spectroscopy (SPIM-FCS) performs [pixelwise fitting of the autocorrelation function](https://github.com/bpi-oxford/Gpufit/blob/master/Gpufit/models/spim_acfN.cuh). Such image analysis can utilise the parallelisation power of GPU to accelerate the research.
+
+One high level analysis package [py-clesperanto](https://github.com/clEsperanto/pyclesperanto_prototype) attempts GPU acceleration based on OpenCL. Such computing process allows bioimage analysis not bound to graphic processing, but to more generic calculations. From this the GPU is often referred as general purpose GPU (GPGPU). Vendors like AMD and Intel are alternatives to NVidia in this sense.
+
+|                 | **NVIDIA**                                                                                                                                                 | **AMD**                                                                                                                               | **Intel**                                                                                                                            | **Apple**                                                                                                               |
+|---------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
+| **OpenCL Support**        | Strong support but secondary to CUDA. Performance is robust.                                                                                               | Major proponent with excellent support and performance.                                                                               | Improved support, especially with newer architectures. Performance historically lagged but is improving.                             | Limited support, focus shifting to Metal.                                                                 |
+| **OpenGL Support**        | Excellent performance and highly optimized drivers.                                                                                                        | Strong support, competitive performance, though historically lagged in specific cases.                                               | Solid support, especially for integrated graphics. Significant improvements with newer discrete GPUs.                                | Historically good, but deprecated in favor of Metal.                                                    |
+| **Vendor-Specific Toolkit** | **CUDA**: Most mature and widely-used. Extensive libraries and tools. Highly optimized hardware.                                                           | **ROCm**: Comprehensive platform, open-source, flexible. Primarily supports newer GPUs.                                               | **oneAPI**: Unified model supporting multiple architectures. Provides DPC++, integrates OpenCL. Relatively new, still gaining traction. | **Metal**: Modern API designed specifically for Apple hardware. Provides low-level access to GPU features.               |
+| **Performance**           | Typically leads in raw performance, especially for CUDA-based applications. Highly optimized.                                                               | Excels in OpenCL applications. High memory bandwidth and compute capabilities.                                                       | Emerging player with promising solutions. Strong integration with CPUs.                                                               | Excellent performance on Apple hardware, particularly optimized for Apple Silicon (M1, M2, etc.).       |
diff --git a/docs/80_hardware/npu.md b/docs/80_hardware/npu.md
@@ -0,0 +1,26 @@
+
+# Neural Processing Unit (NPU)
+<div style="text-align: center;">
+  <img src="./npu_2.png" alt="Placeholder Image" style="width:50%;">
+  <p><em>Schematic depiction of the outer matrix product AB of two matrices A and B. NPUs implement general matrix multiplications (GEMMs) with physical wired circuits to accelerate AI calculations. </em></p>
+</div>
+
+A Neural Processing Unit (NPU) is a specialized hardware accelerator designed to efficiently handle the computational demands of AI and machine learning tasks, particularly neural network inference and training. NPUs are optimized for the deep learning based bioimage analysis, such as Stardist, Cellpose, PlantSeg and CSBDeep CARE denoising.
+
+In mid-2024 the NPUs are embedded in various laptops, allowing a wider choice in AI applications.
+
+| Feature                          | Google TPU (USB/M.2)      | Apple Silicon      | AMD                      | Intel (after Meteor Lake)       | NVIDIA (Grace Hopper)     | NVIDIA (Jetson)           | Qualcomm Snapdragon X Elite          |
+|----------------------------------|---------------------------|--------------------|--------------------------|---------------------------|---------------------------|---------------------------|---------------------------|
+| **Product Name**                 | Edge TPU                  | Apple Neural Engine| 3rd Gen Ryzen AI| VPU, GNA, AI Engine       | TensorRT, DLA, Grace Hopper| Jetson Xavier, Nano, TX2  | Qualcomm AI Engine        |
+| **Primary Use Case**             | Low Power Devices AI| Mobile, Desktop    | GPUs with AI Capabilities, large data inference| Mobile, Desktop | Data Center, HPC, Embedded | Embedded AI     | Laptop    |
+| **Performance**                  | Moderate                  | High               | Moderate to High         | Moderate to High          | Very High                 | Moderate to High          | Moderate                  |
+| **Efficiency**                   | High                      | High               | Moderate                 | High                      | Moderate to High          | High                      | High                      |
+| **Special Features**             | Google Cloud Compatible, Tensor Operations| Unified Memory, Tight OS Integration | APUs, ROCm | Low Power, Vision Processing, Integrated AI | CUDA Integration, Tensor Cores | Low Power, Integrated AI | ARM based Windows Laptop |
+| **Flexibility**                  | Specialized for TensorFlow| General Purpose    | AI with General Compute  | Specialized for AI and Vision| Highly Specialized        | General Purpose           | General Purpose           |
+| **Compatibility**                | TensorFlow Lite           | macOS              | Windows, Linux           | Windows, Linux            | Windows, Linux            | Linux                     | Windows, Linux          |
+| **Scalability**                  | High                      | Moderate           | Moderate                 | Moderate                  | High                      | Moderate                  | Moderate                  |
+| **Availability**                 | USB, M.2 Modules          | Built-in (M-series)| Radeon Instinct GPUs, APUs | Integrated in Meteor Lake CPUs | Available in GPUs, Servers | Available in Embedded Modules | Snapdragon System on Chips (SoCs)           |
+
+## External Reading:
+- [Get started with tensorflow-metal (AI on Apple Neural Engine)](https://developer.apple.com/metal/tensorflow-plugin/)
+- [PluggableDevice: Device Plugin for Tensorflow](https://blog.tensorflow.org/2021/06/pluggabledevice-device-plugins-for-TensorFlow.html)
diff --git a/docs/80_hardware/npu.png b/docs/80_hardware/npu.png
diff --git a/docs/80_hardware/npu_2.png b/docs/80_hardware/npu_2.png