Intel® Extension for MLIR (IMEX) is a collection of MLIR dialects and passes from Intel for supporting MLIR lowering to Intel silicon (CPU, GPU, …). Goal of this project is to support development of MLIR enhancements for upstream contribution, and to provide a sandbox for validation independent of front end frameworks. Current project scope includes:
- Dialects and passes needed to lower and execute MLIR entry dialect (linalg, CFG, and etc) on Intel GPU.
- Wrapper libraries to inteface with level zero runtime and sycl runtime supporting Intel GPU.
- Other experimental dialects: NDArray, Dist
- CMake >= 3.20.0
- Ninja
- doxygen (Optional for building docs)
- pre-commit
- clang-format
- lit (If building with option 2 below. https://pypi.org/project/lit/)
Instructions here
https://dgpu-docs.intel.com/installation-guides/index.html
Install DPC++ compiler : Instructions here
https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html#dpcpp-cpp
Once DPC++ is installed source the compiler vars:
source /PATH_TO/intel/oneapi/compiler/latest/env/vars.sh
- Build from source for non system-wide(local) install
git clone https://github.com/oneapi-src/level-zero.git
cd level-zero
cmake -G Ninja -B build -S . \
   -DCMAKE_BUILD_TYPE=Release \
   -DCMAKE_INSTALL_PREFIX=../level-zero-install
cmake --build build --target install- Binary package for system-wide install: https://github.com/oneapi-src/level-zero/releases
conda create -n imex-dev -c intel -c defaults -c conda-forge pip">=21.2.4" pre-commit cmake clang-format lit doxygen
conda activate imex-devpre-commit install -f -c .pre-commit-config.yamlIMEX supports three different ways of building depending on how LLVM is set up. Option 1 is in-tree (Built as part of LLVM) and option 2 and 3 are out-of-tree (Built outside of LLVM)
IMEX can be treated like a sub-project of LLVM and built as part of LLVM by using an LLVM config option called LLVM_EXTERNAL_PROJECTS.
git clone https://github.com/intel/mlir-extensions.git
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
git checkout `cat ../mlir-extensions/build_tools/llvm_version.txt`
git apply ../mlir-extensions/build_tools/patches/*
cmake -G Ninja -B build -S llvm \
   -DLLVM_ENABLE_PROJECTS=mlir \
   -DLLVM_BUILD_EXAMPLES=ON \
   -DLLVM_TARGETS_TO_BUILD="X86" \
   -DCMAKE_BUILD_TYPE=Release \
   -DLLVM_ENABLE_ASSERTIONS=ON \
   -DLLVM_EXTERNAL_PROJECTS="Imex" \
   -DLLVM_EXTERNAL_IMEX_SOURCE_DIR=../mlir-extensions
# For GPU support pass thes cmake variables to enable the required runtime libraries
#  -DIMEX_ENABLE_L0_RUNTIME=1
#  -DIMEX_ENABLE_SYCL_RUNTIME=1
# Additional if using a non system wide Level Zero Loader built from source
#  -DLEVEL_ZERO_DIR=/PATH_TO/level-zero-install
cmake --build build --target check-imexNote: -DLLVM_INSTALL_UTILS=ON is not needed for this build since all tests
will run using the FileCheck utility that is available in the build tree.
An external lit is not needed as well, since all tests will run using llvm-lit
in the build tree.
Note: Make sure to pass -DLLVM_INSTALL_UTILS=ON when building LLVM with
CMake so that it installs FileCheck to the chosen installation prefix.
Additonally, lit has to be installed separately as it does not install with
the rest of LLVM.
Make sure the installed LLVM is built from the git commit sha as stated in
build_tools/llvm_version.txt.
And has all LLVM patches in build_tools/patches applied.
cmake -G Ninja -B build -S . \
   -DMLIR_DIR=<PATH_TO_DIRECTORY_WITH_MLIRConfig.cmake> \
   -DLLVM_EXTERNAL_LIT=<PATH_TO_LIT> \
   -DCMAKE_BUILD_TYPE=Release
# For GPU support pass thes cmake variables to enable the required runtime libraries
#  -DIMEX_ENABLE_L0_RUNTIME=1
#  -DIMEX_ENABLE_SYCL_RUNTIME=1
# Additional if using a non system wide Level Zero Loader built from source
#  -DLEVEL_ZERO_DIR=/PATH_TO/level-zero-install
cmake --build build --target check-imexThis is similar to option 2. Instead of installed LLVM, LLVM build tree is used.
Make sure before building LLVM, checkout the git commit sha as stated in
build_tools/llvm_version.txt.
And apply all LLVM patches in build_tools/patches.
cmake -G Ninja -B build -S . \
   -DMLIR_DIR=<PATH_TO_DIRECTORY_WITH_MLIRConfig.cmake> \
   -DCMAKE_BUILD_TYPE=Release
# For GPU support pass thes cmake variables to enable the required runtime libraries
#  -DIMEX_ENABLE_L0_RUNTIME=1
#  -DIMEX_ENABLE_SYCL_RUNTIME=1
# Additional if using a non system wide Level Zero Loader built from source
#  -DLEVEL_ZERO_DIR=/PATH_TO/level-zero-install
cmake --build build --target check-imexTo build user documentation do
cmake --build build --target mlir-docIt will render docs to the 'doc' directory.
To build code documentation use '-DIMEX_INCLUDE_DOCS' when configuring with cmake and do
cd build
cmake --build build --target doc_doxygen# enter root directory of mlir-extension
cd mlir-extensions
python scripts/add_dialect.py <name-of-new-dialect>This will
- generate directories IRandTransformsin the directories (include/mlir/Dialectandlib/dialect)
- Extend/Create cmake infrastructure with defaults
- Create stub source files for IR and transforms
- include/imex/Dialect/<name>/IR/<name>Ops.h
- include/imex/Dialect/<name>/IR/<name>Ops.td
- lib/Dialect/IR/<name>Ops.cpp
- include/imex/Dialect/<name>/Transforms/Passes.h
- include/imex/Dialect/<name>/Transforms/Passes.td
- lib/Dialect/Transforms/PassDetail.h
 
Now, it's your turn to
- Add your dialect and its transforms/passes to appropriate places in
- include/imex/InitIMEXDialects.h
- include/imex/InitIMEXPasses.h
- lib/Conversion/IMEXPassDetail.h
 
- Fill in what's marked with FIXME
- The documentation of the dialect should go into the descriptionfields in<name>Ops.td. At build time the description will be extracted and a filedoc/<name>.mdwill be generated automatically. It will include descriptions of the dialect and operations in a standardized way.
# enter root directory of mlir-extension
cd mlir-extensions
python scripts/add_conversion.py $name-of-source-dialect $name-of-target-dialectThis will
- Let $conversion-name name be "$name-of-source-dialectTo$name-of-target-dialect"
- Add directories include/mlir/Conversion/<conversion-name>andlib/Conversion/<conversion-name>
- Extend/Create cmake infrastructure with defaults
- Add declarations to header include/mlir/Conversion/<conversion-name>/<conversion-name>.h
- Put cpp definition stubs to lib/Conversion/<conversion-name>/<conversion-name>.cpp
- Add conversion to include/imex/Conversion/IMEXPasses.td and include/imex/Conversion/IMEXPasses.h
- Add a pass def stub to include/imex/Conversion/IMEXPasses.td and include/imex/Conversion/Passes.td
You will now have to
- Fill in the above files what's marked with FIXME
- The documentation of the pass should go into the descriptionfield inPasses.td. At build time the description will be extracted and a filedoc/Conversions.mdwill be generated automatically.
- Write your Pattern rewriters
To run the FileCheck based tests, follow the following steps:
cmake --build build --target check-imexAdd '-v' to the above command-line to get verbose output.
IMEX provides an initial set of benchmarks for studying its performance. To build these benchmarks, users need
to manually add -DIMEX_ENABLE_BENCHMARK=ON option when building the IMEX. The benchmark testcases and the
script for running them will be generated under the build/benchmarks folder.
Currently, IMEX provides benchmarks for the following 4 categories of operations:
| Operation | CPU | GPU | 
|---|---|---|
| elementwise (relu and silu) | Yes | Yes | 
| reduction (softmax) | Yes | Yes | 
| transpose (transpose) | Yes | Yes | 
| fusion (kInputFusion and kLoopFusion) | No | Yes | 
These test cases are mainly implemented using linalg dialect, and the spriv test cases for
relu are also provided. Each testcase is named following the pattern of opname_shape_dtype.mlir
For simplicity, the bench_imex script is provided to run the benchmark. It can take a mlir file or a folder as input.
for the later case, it will simply run all test cases inside the folder. In addition, it also has to choose a runtime
based on the option. It accepts one of the following three options:
- -cfor cpu runtime
- -lfor level-zero runtime (for INTEL GPU)
- -sfor sycl runtime (for INTEL GPU)
# run a specific test case on CPU
 ./bench_imex -c relu/cpu/relu_1x160x160x120_f16.mlir
# run a set of test cases on GPU using sycl runtime
 ./bench_imex -s relu/gpu/NOTE: if you are using
-c, please use testcases undercpusubfolder; similarly, if you are using-sor-l, please use testcases undergpusubfolder. Otherwise, it may have unspecified errors or behaviors.
IMEX benchmark suite is implemented using CMAKE template, and initially provides limited set of shapes extraced from some production models, e.g., BERT, and AlexNet.
- ReLU: 1x160x160x120, 50x640x20x15, 512x640x20x15
- SiLU: 1x1024x40x30, 50x20x3072, 512x640x20x15
- Softmax: 1x2000, 16x2000, 64x2000, 256x2000, 1024x2000
- Transpose: 128x136, 1024x1024, 16x96x96, 96x7x96
- Reduce: 32x16x512x512
Users can extend it to evaluate more shapes by editing the, e.g, relu.shapes.in file, in each subfolder, and then
rebuild the imex. User can also add new data types, but it is currently only limited to basic data types including
fp32, fp16, int32 etc.
export IMEX_ENABLE_PROFILING=ON
run the testpython {your_path}/imex_runner.py xxx -o test.mlir
mlir-translate test.mlir -mlir-to-llvmir -o test.ll
llc test.ll -filetype=obj -o test.o
clang++ test.o {path}/libmlir_runner_utils.so {path}/libmlir_c_runner_utils.so {path}/libsycl-runtime.so -no-pie -o test
ze_tracer ./test- Not using LoadOp. Instead, everything is a SubviewOp. Any size-1 dim must be annotated with static size 1.
- right now we can only broadcast size-1 dims if their extent is statically known (to be 1)
 
- Generally, rank reduction of SubviewOp needs overhaul.
- Right now, no rank reduction is happening, and appropriate affine maps are generated accordingly
- Without dist-coalesce, repartitioning of 0d arrays coming from a subview do not work correctly. Only the owning process will have the right data.
- Even if SubviewOp resulted in rank-reduced arrays, we cannot view into our local data since the element might be remote.
- To follow existing mechanisms (e.g. target parts) we'd basically need to duplicate the entire array.
- We probably need some special feature to hold duplicates of slices with only one element on the distributed axis.
 
- NDArray/dist tests can be run (without GPU tests etc) uwing cmake --build . --target check-ndarray
This code is made available under the Apache License 2.0 with LLVM Exceptions.
See the LICENSE.txt file for more details.