In this project, you will use a Gaze Detection Model to control the mouse pointer of your computer. You will be using the Gaze Estimation model to estimate the gaze of the user's eyes and change the mouse pointer position accordingly. This project will demonstrate your ability to run multiple models in the same machine and coordinate the flow of data between those models.
You will be using the InferenceEngine API from Intel's OpenVino ToolKit to build the project. The gaze estimation model requires three inputs:
- The head pose
- The left eye image
- The right eye image.
To get these inputs, you will have to use three other OpenVino models:
You will have to coordinate the flow of data from the input, and then amongst the different models and finally to the mouse controller. The flow of data will look like this:
Step1: Download below three softwares:
- Microsoft Visual Studio* with C++ 2019, 2017, or 2015 with MSBuild
- CMake 3.4 or higher 64-bit NOTE: If you want to use Microsoft Visual Studio 2019, you are required to install CMake 3.14.
- Python 3.6.5 64-bit
Step2. Download OpenVino Toolkit 2020.1 with all the prerequisites by following this installation guide
Step3: Setup OpenVino Toolkit using below command in command prompt
cd C:\Program Files (x86)\IntelSWTools\openvino\bin\
setupvars.bat
Step4: Configure Model Optimizer using below commnads in command prompt
cd C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer\install_prerequisites
install_prerequisites.bat
Step5: Varify installation
cd C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\demo\
demo_squeezenet_download_convert_run.bat
Above command should give output like this image

Step1. Clone the Repository using git clone https://github.com/bhadreshpsavani/Computer-Pointer-Controller.git
Step2. Create Virtual Environment using command python -m venv base in the command prompt, then activate environment using below command,
cd base/Scripts/
activate
Step3. install all the dependency using pip install requirements.txt.
Step4. Instantiate OpenVino Environment. For windows use below command
cd C:\Program Files (x86)\IntelSWTools\openvino\bin\
setupvars.bat
Step5. Go back to the project directory src folder
cd path_of_project_directory
cd src
Step6. Run below commands to execute the project
python main.py -fd ../intel/face-detection-adas-binary-0001/FP32-INT1/face-detection-adas-binary-0001.xml \
-lr ../intel/landmarks-regression-retail-0009/FP32-INT8/landmarks-regression-retail-0009.xml \
-hp ../intel/head-pose-estimation-adas-0001/FP32-INT8/head-pose-estimation-adas-0001.xml \
-ge ../intel/gaze-estimation-adas-0002/FP32-INT8/gaze-estimation-adas-0002.xml \
-i ../bin/demo.mp4 -flags ff fl fh fg
Command Line Argument Information:
- fd : Specify path of xml file of face detection model
- lr : Specify path of xml file of landmark regression model
- hp : Specify path of xml file of Head Pose Estimation model
- ge : Specify path of xml file of Gaze Estimation model
- i : Specify path of input Video file or cam for Webcam
- flags (Optional): if you want to see preview video in separate window you need to Specify flag from ff, fl, fh, fg like -flags ff fl...(Space seperated if multiple values) ff for faceDetectionModel, fl for landmarkRegressionModel, fh for headPoseEstimationModel, fg for gazeEstimationModel
- probs (Optional): if you want to specify confidence threshold for face detection, you can specify the value here in range(0, 1), default=0.6
- d (Optional): Specify Device for inference, the device can be CPU, GPU, FPGU, MYRID
- o : Specify path of output folder where we will store results
intel: This folder contains models in IR format downloaded from Openvino Model Zoo
src: This folder contains model files, pipeline file(main.py) and utilities
model.pyis the model class file which has common property of all the other model files. It is inherited by all the other model files This folder has 4 model class files, This class files has methods to load model and perform inference.face_detection_model.pygaze_estimation_model.pylandmark_detection_model.pyhead_pose_estimation_model.pymain.pyfile used to run complete pipeline of project. It calls has object of all the other class files in the foldermouse_controller.pyis utility to move mouse curser based on mouse coordinates received fromgaze_estimation_modelclass predict method.input_feeder.pyis utility to load local video or webcam feedbanchmark.ipynb,computer_controller_job.sh, andPrecisionComparsion.ipynbare for banchmarking result generation for different hardware and model comparison
bin: this folder has demo.mp4 file which can be used to test model
I have checked Inference Time, Model Loading Time, and Frames Per Second model for FP16, FP32, and FP32-INT8 of all the models except Face Detection Model. Face Detection Model was only available on FP32-INT1 precision.
You can use below commands to get results for respective precisions,
FP16:
python main.py -fd ../intel/face-detection-adas-binary-0001/FP32-INT1/face-detection-adas-binary-0001.xml -lr ../intel/landmarks-regression-retail-0009/FP16/landmarks-regression-retail-0009.xml -hp ../intel/head-pose-estimation-adas-0001/FP16/head-pose-estimation-adas-0001.xml -ge ../intel/gaze-estimation-adas-0002/FP16/gaze-estimation-adas-0002.xml -d CPU -i ../bin/demo.mp4 -o results/FP16/ -flags ff fl fh fg
FP32:
python main.py -fd ../intel/face-detection-adas-binary-0001/FP32-INT1/face-detection-adas-binary-0001.xml -lr ../intel/landmarks-regression-retail-0009/FP32/landmarks-regression-retail-0009.xml -hp ../intel/head-pose-estimation-adas-0001/FP32/head-pose-estimation-adas-0001.xml -ge ../intel/gaze-estimation-adas-0002/FP32/gaze-estimation-adas-0002.xml -d CPU -i ../bin/demo.mp4 -o results/FP32/ -flags ff fl fh fg
FP32-INT8:
python main.py -fd ../intel/face-detection-adas-binary-0001/FP32-INT1/face-detection-adas-binary-0001.xml -lr ../intel/landmarks-regression-retail-0009/FP32-INT8/landmarks-regression-retail-0009.xml -hp ../intel/head-pose-estimation-adas-0001/FP32-INT8/head-pose-estimation-adas-0001.xml -ge ../intel/gaze-estimation-adas-0002/FP32-INT8/gaze-estimation-adas-0002.xml -d CPU -i ../bin/demo.mp4 -o results/FP32-INT8/ -flags ff fl fh fg
Synchronous Inference
precisions = ['FP16', 'FP32', 'FP32-INT8']
Inference Time : [26.6, 26.4, 26.9]
fps : [2.218045112781955, 2.234848484848485, 2.193308550185874]
Model Load Time : [1.6771371364593506, 1.6517729759216309, 5.205628395080566]
Asynchronous Inference
precisions = ['FP16', 'FP32', 'FP32-INT8']
Inference Time : [23.9, 24.7, 24.0]
fps : [2.468619246861925, 2.388663967611336, 2.4583333333333335]
Model Load Time : [0.7770581245422363, 0.7230548858642578, 2.766681432723999]
- From above observations we can say that
FP16has lowest model time andFP32-INT8has highest model loading time, the reason for the higher loading time can be said as combination of precisions lead to higher weight of the model forFP32-INT8. - For
Inference TimeandFPS,FP32give slightly better results. There is not much difference for this three different models for this two parameters. - I have tested model for Asynchronous Inference and Synchronous Inference, Asynchronous Inference has better results it has slight improvement in
inference timeandFPS
- Multiple People Scenario: If we encounter multiple people in the video frame, it will always use and give results one face even though multiple people detected,
- No Head Detection: it will skip the frame and inform the user
- lighting condition: We might use HSV based pre-processing steps to minimize error due to different lighting conditions








