fas is floating point arithmetic for arbitrary mantissa and exponent types in modern header-only C++. It lets you construct various different float types using template parameters for the mantissa, exponent and base.
The constructed float-types look and fell like a native float/double for arithmetic operations. Furthermore all methods are performed on the stack and do not require any heap space.
fas is header-only.
fas::Float<int16_t, int8_t> f;This will result in a float using a signed 16 bit mantissa and a signed 8 bit exponent.
The arithmetical operations are supported using their corresponding operators
+ - * / += -= *= /= ++ --
fas::Float<int16_t, int8_t> f1 = 10;
auto f2 = f1;
f2 = -f1; // => -10
f2 = f1 + 10; // => 20
f2 = f1 - f1; // => 0
f2 = f1 * f1; // => 100
f2 = f1 / 20; // => 0.5
f2 += 1; // => 1.5
f2 -= 1; // => 0.5
f2 *= 2; // => 1
f2 /= 2; // => 0.5
f1++; // => 11
++f1; // => 12
f1--; // => 11
--f1; // => 10Each type knows its boundaries:
MAX()returns the largest valueMIN()returns the smallest positive valueLOWEST()returns the smallest value The naming is the same as with internal float/double.
There are overloadings for std::numeric_limits available:
fas::Float<int8_t, int8_t>::MAX() returns the same as
std::numeric_limits<fas::Float<std::int16_t, std::int8_t>>::MAX()
which is approx 2.16079e+40.
fas supports type traits:
is_fundamental<int16_t, int16_t>::value // => true
is_floating_point<fas::Float<int16_t, int16_t>::value // => true
is_arithmetic<int16_t, int16_t>::value // => true
is_scalar<int16_t, int16_t>::value // => true
is_object<int16_t, int16_t>::value // => trueconstexpr c = fas::Float<std::int16_t, std::int8_t>(1);Even the operations' results:
constexpr c1 = fas::Float<std::int16_t, std::int8_t>(1);
constexpr c2 = fas::Float<std::int16_t, std::int8_t>(2);
constexpr c3 = c1 + c2; // => 3fas offers an output stream overload, which can be used for std::cout:
#include "fas/stream.hpp"
...
fas::Float<int8_t, int8_t> fas_float(1);
fas_float /= 3;
std::cout << fas_float << "\n"; // => 70 * 7 ^ fffffffd ≈ 0.32653061224489793313Typical floats such as float or double are to the base of
2. fas allows to construct floats to any base:
fas::Float<int8_t, int8_t, 7> fas_float(0);
double cpp_float = 0;
for(auto i=0; i<7; ++i) {
fas_float += fas::Float<int8_t, int8_t, 7>(1)/7;
cpp_float += 1.0/7;
}
std::cout << std::setprecision(20);
std::cout << fas_float << "\n"; // => exactly 1
std::cout << cpp_float << "\n"; // => 0.99999999999999977796The setting of different bases allows to represent specific fractions exaclty.
In this case the base is 7, so any fraction by 7 is represented exactly.
Compare to the native double which is always to the base of 2, thus can not
represent 1/7 exactly.
Download the float.hpp and include it in your source. Don't forget to add it in your include path.
To build and run unit tests type:
mkdir build; cd build; cmake ..; make && ./tests/tests- The Stl's
std::numeric_limitsis required the limits of the specified types for mantissa and exponent. - Catch2 is required to build the unit tests.
- (configurable) rounding support
exp,sqrt,powand other math.h functions- string constructor
- string representation
- value constructor for all ints and other types
- a version without stl