First of all, congrats and thanks for the crate !
I'm currently using fasteval in a computer graphics project, and in that world, f64 or double-precision numbers are very rarely used: not only is the precision often unnecessary but it makes GPUs much slower (often by 2x) as GPUs are big SIMD/T machines and can pack 2 times more f32 instructions than f64 per cycle, and I suppose this is the same for SIMD in CPUs.
This could be an easier step towards full SIMD use within the crate, though I haven't implemented it as a PoC.
Currently I'm converting the f64 to f32 before sending it to the GPU but I think this inefficiency may be a nice optimization, and a good step towards the milestone of supporting arbitrary-precision numbers.
First of all, congrats and thanks for the crate !
I'm currently using
fastevalin a computer graphics project, and in that world,f64or double-precision numbers are very rarely used: not only is the precision often unnecessary but it makesGPUs much slower (often by 2x) as GPUs are big SIMD/T machines and can pack 2 times moref32instructions thanf64per cycle, and I suppose this is the same for SIMD in CPUs.This could be an easier step towards full SIMD use within the crate, though I haven't implemented it as a PoC.
Currently I'm converting the
f64tof32before sending it to the GPU but I think this inefficiency may be a nice optimization, and a good step towards the milestone of supporting arbitrary-precision numbers.