Wide-decimal

Wide-decimal implements a generic C++ template for extended width decimal float types.

This C++ template header-only library implements drop-in big decimal float types, such as dec51_t, dec101_t, dec1001_t, dec10001_t, dec1000001_t, etc., that can be used essentially like regular built-in floating-point types. Wide-decimal supports decimal float types having digit counts ranging roughly from about ${\sim}~10 {\ldots} 10^{7}$.

Wide-decimal implements both elementary algebraic operations as well as a few common <cmath>-like functions such as fabs, sqrt and log. It also includes full support for std::numeric_limits.

Wide-decimal is written in header-only C++14, and compatible through C++14, 17, 20, 23 and beyond.

Implementation goals

Instances of the decwide_t type should behave as closely as possible to the behavior of built-in floating-point types.
Relatively wide precision range from about ${\sim}~10 {\ldots} 10^{7}$ decimal digits.
Moderately good efficiency over the entire wide precision range.
Clean header-only C++14 design.
Seamless portability to any modern C++14, 17, 20, 23 compiler and beyond.
Scalability with small memory footprint and efficiency suitable for both PC/workstation systems as well as bare-metal embedded systems.

Quick start

Easy application follows via a traditional C-style typedef or C++14 alias. The defined type can be used very much like a built-in floating-point type.

The following sample, for instance, defines and uses a decimal type called dec101_t having $101$ decimal digits of precision. In this example, the subroutine do_something() initializes the variable d of type dec101_t with $\frac{1}{3}$. The $101$ digit value of d is subsequently printed to the console. The console precision is set with std::setprecision in combination with dec101_t's specialization of std::numeric_limits.

In particular,

#include <iomanip>
#include <iostream>

#include <math/wide_decimal/decwide_t.h>

void do_something()
{
  using dec101_t = ::math::wide_decimal::decwide_t<INT32_C(101), std::uint32_t, void>;

  dec101_t d = dec101_t(1) / 3;

  // 0.33333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333
  std::cout << std::setprecision(std::numeric_limits<dec101_t>::digits) << d << std::endl;
}

The local data type dec101_t is defined with a C++14 alias. The first template parameter INT32_C(101) sets the decimal digit count while the second optional template parameter std::uint32_t sets the internal limb type. If the second template parameter is left blank, the default limb type is thirty-two bits in width and unsigned.

The template signature of the decwide_t class is shown below.

// Forward declaration of the decwide_t template class.
template<const std::int32_t ParamDigitsBaseTen,
         typename LimbType          = std::uint32_t,
         typename AllocatorType     = std::allocator<void>,
         typename InternalFloatType = double,
         typename ExponentType      = std::int64_t,
         typename FftFloatType      = double>
class decwide_t;

decwide_t also has a third (and a few more) optional template paramter(s). The third template parameter AllocatorType can be used to set the allocator type for internal storage of the wide decimal type. The default allocator type is std::allocator<limb_type> which helps to reduce stack consumption, especially when using higher digit counts. If low digit counts are used, the allocator type can be explicitly set to void and stack allocation is used with an array-like internal representation.

Examples

Various interesting examples, some of which are quite algorithmically challenging, have been implemented. It is hoped that the examples provide inspiration and guidance on how to use wide-decimal.

The examples include the following.

verifies schoolbook multiplication using a decwide-t type having 8-bit limbs in a small digit range.
performs a hard-coded multiplication check resulting in $\pi^2$.
computes a square root.
computes a seventh root.
uses roots and algebraic operations to compute $5,001$ decimal digits of a fascinating Pisot number that is almost integer.
computes a square root with a wide decimal representation having 8-bit limbs.
verifies a list of values $2^n$ with $-128 {\le} n {\le} 127$.
calculates $1,000,001$ decimal digits of $\pi$ using a Gauss AGM iteration.
calculates $1,000,001$ decimal digits of $\pi$ using a 16-bit internal limb type.
calculates $100,001$ decimal digits of $\pi$.
calculates $1,000,001$ decimal digits of $\pi$ using a Borwein quintic iteration.
calculates yet again $1,000,001$ decimal digits of $\pi$ using an 8-bit internal limb type and float internal floating-point type.
(TBD) verifies some basic C++20 constexpr algebraic operations.
computes a Riemann zeta function value.
implements cylindrical Bessel functions of integral order via downward recursion with a Neumann sum.
performs a small-argument polylogarithm series calculation.
calculates the value of a logarithm (internally using a Gauss AGM method).
computes $1,001$ decimal digits of Catalan's constant using an accelerated series.
implements tgamma(x) using Stirling's asymptotic expansion of the logarithm of the Gamma function with Bernoulli numbers and subsequently calculates $1,001$ decimal digits of $\Gamma(n/2)$ for small integer $n$.
checks basic compatibility of standalone decwide_t with Boost.Math by testing a cube root value obtained from boost::math::cbrt.
also checks standalone decwide_t with significantly more of Boost.Math by testing a $1,001$ digit generalized Legendre function value (using boost::math::tgamma and more to do so).
checks yet again standalone decwide_t with Boost.Math's available boost::math::tgamma function for small-ish decimal floats having ${\lesssim}~100$ decimal digits.
calculates a $1,001$ decimal digit hypergeometric function value using an iterative rational approximation scheme.
calculates another $1,001$ decimal digit hypergeometric function in a similar fashion.
uses trapezoid integration with an integral representation involving locally-written trigonometric sine and cosine functions to compute several cylindrical Bessel function values.
verifies the proper representation of a wide selection of small-valued, pure integral rational quotients.
and exercise calculations that also run on tiny bare-metal embedded systems. These two straightforward embedded-ready examples feature a ${\sim}100$ digit square root calculation and a ${\sim}50$ digit AGM iteration for $\pi$, respectively.

Testing, CI and Quality Checks

Testing

Testing is definitely a big issue. A growing, supported test suite improves confidence in the library. It provides for tested, efficient functionality on the PC and workstation. The GitHub code is, as mentioned above, delivered with an affiliated MSVC project or a variety of other build/make options that use easy-to-understand subroutines called from main(). These exercise the various examples and the full suite of test cases.

If an issue is reported, reproduced and verified, an attempt is made to correct it without breaking any other code. Upon successful correction, specific test cases exercising the reported issue are usually added as part of the issue resolution process.

CI and Quality checks

CI runs on both push-to-branch as well as pull request using GitHub Actions. Various compilers, operating systems, and C++ standards ranging from C++14, 17, 20, 23 are included in CI.

In CI, we use both elevated GCC/clang compiler warnings as well as MSVC level 4 warnings active on the correspondoing platforms. For additional in-depth syntax checking, clang-tidy is used both in CI as well as in offline checks to improve static code quality.

Both GCC's run-time sanitizers as well as valgrind (see also [1] and [2] in the References below) are used in CI in order to help assure dynamic quality.

Additional quality checks are performed on pull-request and merge to master using modern third party open-source services. These include CodeQL, Synopsis Coverity, and CodeSonar. At the moment, the Coverity check is run with manual report submission. Automation of this is, however, planned.

Code coverage uses GCC/gcov/lcov and has a quality-gate with comparison/baseline-check provided by Codecov.

Quality badges are displayed at the top of this repository's readme page.

Build status

Additional details

Wide-decimal has been tested with numerous compilers for target systems ranging from 8 to 64 bits. The library is specifically designed for modest efficiency (not the world's fastest) over the entire range of small to large digit counts. How efficient is the code? Based on a very general comparison, a million digit AGM calculation of $\pi$ is about five times slower than an equivalent calculation performed with a big float data type based on GMP.

Portability of the code is another key point of focus. Special care has been taken to test in certain high-performance embedded real-time programming environments.

When working with even the most tiny microcontroller systems, various heavy-wieght features such as I/O streaming, dynamic memory allocation, construction from character string and caching constant values such as $\pi$ and $\log(2)$ can optionally be disabled with the compiler switches:

#define WIDE_DECIMAL_DISABLE_IOSTREAM
#define WIDE_DECIMAL_DISABLE_DYNAMIC_MEMORY_ALLOCATION
#define WIDE_DECIMAL_DISABLE_CONSTRUCT_FROM_STRING
#define WIDE_DECIMAL_DISABLE_CACHED_CONSTANTS
#define WIDE_DECIMAL_DISABLE_USE_STD_FUNCTION
#define WIDE_DECIMAL_NAMESPACE

Each one of these compiler switches has an intuitive name intended to represent its meaning.

Note: Activating the option WIDE_DECIMAL_DISABLE_DYNAMIC_MEMORY_ALLOCATION simultaneously disallows using decwide_t in a multithreaded application. So if PC-based or other kinds of multithreading are used, then dynamic memory allocation is needed and can not be disabled. In other words,

// Activate/Deactivate the disable of dynamic memory.
// For most multithreaded PC work, comment out or remove
// this line entirely (i.e., thereby enable dynamic memory).

#define WIDE_DECIMAL_DISABLE_DYNAMIC_MEMORY_ALLOCATION

Let's consider also the macro WIDE_DECIMAL_NAMESPACE in greater detail.

#define WIDE_DECIMAL_NAMESPACE something_unique

This is an advanced macro intended to be used in strict, exacting applications for which using the unqualified, global namespace math (i.e., namespace ::math) is undesired or inacceptable. We recall that all parts of the wide-decimal implementation, such as the decwide_t class and its associated implementation details reside within namespace ::math::wide_decimal

Consider defining the macro WIDE_DECIMAL_NAMESPACE to be a self-chosen, unique namespace name, for instance something like

-DWIDE_DECIMAL_NAMESPACE=something_unique

This places all parts of the wide-decimal's uintwide_t template class implementation and its associated details within the prepended outer namespace something_unique - as in

namespace something_unique::math::wide_decimal
{
  template<const std::int32_t ParamDigitsBaseTen,
           typename LimbType          = std::uint32_t,
           typename AllocatorType     = std::allocator<void>,
           typename InternalFloatType = double,
           typename ExponentType      = std::int64_t,
           typename FftFloatType      = double>
  class decwide_t;

  // And also all its related types and functions.
}

When utilizing the WIDE_DECIMAL_NAMESPACE option, it is easy to vary the actual name or nesting depth of the desired prepended outer namespace if/as needed for your particular project.

By default the macro WIDE_DECIMAL_NAMESPACE is not defined. In this default state, namespace ::math::wide_decimal is used and the decwide_t class and its associated implementation details reside therein.

Examples

Basic square root

The example below performs a basic square root calculation, given by approximately

$$\sqrt{\frac{123456}{100}} \approx 35.1363060095963986639333846404180557597515{\ldots}$$

#include <cstdint>
#include <iomanip>
#include <iostream>

#include <math/wide_decimal/decwide_t.h>

auto main() -> int
{
  using local_limb_type = std::uint16_t;

  using dec101_t = ::math::wide_decimal::decwide_t<INT32_C(101), local_limb_type, void>;

  const dec101_t s = sqrt(dec101_t(123456U) / 100);

  // N[Sqrt[123456/100], 101]
  const dec101_t control
  {
    "35.136306009596398663933384640418055759751518287169314528165976164717710895452890928635031219132220978"
  };

  const dec101_t closeness = fabs(1 - (s / control));

  const auto result_is_ok = (closeness < (std::numeric_limits<dec101_t>::epsilon() * 10));

  std::cout << "result_is_ok: " << std::boolalpha << result_is_ok << std::endl;
}

1,000,001 digits of pi

In the following code, we compute $1,000,001$ (one million and one) decimal digits of the fundamental constant $\pi$. The truncated (non-rounded) expected result is $3.14159265{\ldots}79458151$ In this particular example, all heavyweight components are deactivated and this particular calculation is, in fact, suitable for a bare-metal mega-digit pi calculation.

In this example, note how a specialized custom allocator called util::n_slot_array_allocator is utilized for exact, efficient memory management of a certain number of temporary storages of mega-digit numbers (tuned to $18$ in this particular example).

#include <iomanip>
#include <iostream>

#define WIDE_DECIMAL_DISABLE_IOSTREAM
#define WIDE_DECIMAL_DISABLE_DYNAMIC_MEMORY_ALLOCATION
#define WIDE_DECIMAL_DISABLE_CONSTRUCT_FROM_STRING
#define WIDE_DECIMAL_DISABLE_CACHED_CONSTANTS

#include <math/wide_decimal/decwide_t.h>
#include <util/memory/util_n_slot_array_allocator.h>

auto main() -> int
{
  using local_limb_type = std::uint32_t;

  constexpr std::int32_t wide_decimal_digits10 = INT32_C(1000001);

  constexpr std::int32_t local_elem_number =
    ::math::wide_decimal::detail::decwide_t_helper<wide_decimal_digits10, local_limb_type>::elem_number;

  constexpr std::int32_t local_elem_digits10 =
    ::math::wide_decimal::detail::decwide_t_helper<wide_decimal_digits10, local_limb_type>::elem_digits10;

  using local_allocator_type = util::n_slot_array_allocator<void, local_elem_number, 18U>;

  using local_wide_decimal_type =
    ::math::wide_decimal::decwide_t<wide_decimal_digits10, local_limb_type, local_allocator_type, double>;

  const auto start = std::clock();

  const local_wide_decimal_type my_pi =
    ::math::wide_decimal::pi<wide_decimal_digits10, local_limb_type, local_allocator_type, double>(nullptr);

  const auto stop = std::clock();

  std::cout << "Time example002_pi(): "
            << static_cast<float>(stop - start) / static_cast<float>(CLOCKS_PER_SEC)
            << std::endl;

  const auto head_is_ok = std::equal(my_pi.crepresentation().cbegin(),
                                     my_pi.crepresentation().cbegin() + ::math::constants::const_pi_control_head_32.size(),
                                     ::math::constants::const_pi_control_head_32.begin());

  using const_iterator_type = typename local_wide_decimal_type::array_type::const_iterator;

  const_iterator_type
    fi
    (
        my_pi.crepresentation().cbegin()
      + static_cast<std::uint32_t>
        (
            static_cast<std::uint32_t>(1UL + ((wide_decimal_digits10 - 1UL) / local_elem_digits10))
          - static_cast<std::uint32_t>(::math::constants::const_pi_control_tail_32_1000001.size())
        )
    );

  const auto tail_is_ok = std::equal(fi,
                                     fi + ::math::constants::const_pi_control_tail_32_1000001.size(),
                                          ::math::constants::const_pi_control_tail_32_1000001.begin());

  const auto result_is_ok = (head_is_ok && tail_is_ok);

  std::cout << "result_is_ok: " << std::boolalpha << result_is_ok << std::endl;
}

The million digit run is comparatively slow and requires approximately ten seconds on a modern PC. Considering, however, the design goals of header-only and capable of running on bare-metal, this is a very nice calculational result.

1,000,001 digits of pi on a bare metal microcontroller

The wide-decimal float back end is used to compute $1,000,001$ decimal digits of the mathematical constant $\pi$ on selected bare-metal OS-less microcontroller systems in pi-crunch-metal

Further details

Mixing wide-decimal with wide-integer

Wide-decimal can be used simultaneously with wide-integer. Although this was not primarily foreseen in the designs of these libraries, harmonized mixing can be done in the same project, and even in the same file. This can, however, lead to a conflicting multiple definition of the container class util::dynamic_array<>, as has been shown in issue 166.

In order to use the uintwide_t.h header together with the decwide_t.h header in the same translation unit, use the definition shown below.

#define WIDE_INTEGER_DISABLE_IMPLEMENT_UTIL_DYNAMIC_ARRAY

Ensure that this definition appears above the line that #includes the uintwide_t.h header. Do this for all files including both big-number headers. Alternatively, this preprocessor switch can be defined on the command line of the compiler call(s) for the project.

The rational for this is as follows. The helper-container template class util::dynamic_array<> is used in both the decwide_t_detail.h header (secondarily included in decwide_t.h) as well as the uintwide_t.h header. Defining WIDE_INTEGER_DISABLE_IMPLEMENT_UTIL_DYNAMIC_ARRAY upstream of the inclusion of the uintwide_t.h header disables that file's definition of util::dynamic_array<>, and ensures that only one definition is visivble in the file containing both big-number headers.

A use-case that uses both wide-decimal as well as wide-integer simultaneously has been implemented in a test file. This file is called test_mixed_wide_decimal_wide_integer_b2n.cpp. It is included in the test folder and also included in continuous integration. This test file involves computations and comparisons of Bernoulli numbers.

(TBD) C++20 `constexpr` support

(TBD) See also issue 291. When using C++20 decwide_t supports compile-time constexpr construction and evaluation of results of binary arithmetic, comparison operators and various elementary functions.

Alternatives and limitations

Alternative libraries for big float types include, among others, most notably GMP and Boost.Multiprecision.

At the moment, wide-decimal has a rather large number of so-called guard digits. In addition, wide-decimal does not make use of rounding when performing multiplication or other common algebraic operations. The combined result is that exact calculations such as those potentially needed in some areas requiring pure, exact decimal calculations may not work well with this library. In this sense, wide-decimal suffers from the same rounding limitations as the cpp_dec_float class known from Boost.Multiprecision.

References

A original publications on valgrind and its relation to memory errors can be found in [1] and [2].

[1] Nicholas Nethercote and Julian Seward, Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation, Proceedings of ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI 2007), San Diego, California, USA, June 2007.

[2] Nicholas Nethercote and Julian Seward, How to Shadow Every Byte of Memory Used by a Program. Proceedings of the Third International ACM SIGPLAN/SIGOPS Conference on Virtual Execution Environments (VEE 2007), San Diego, California, USA, June 2007.