Taipei Forcing Club

Computer science and contract bridge

How to program mathematical functions

It is achievable to write fast and precise mathematical functions. There are cutting-edge implementations in computers, phones, calculators, and game consoles. Some of them are open source, like glibc and musl, from which we can learn. I have also been working on mathematics in metallic, as a shameless plug.

It may seem that mathematical functions are hardware instructions. We usually code them in software instead. The trend is to have the hardware deal with only basic operations after decades of evolution. We can perform mathematics with only operations required in IEEE 754 and integer operations via type punning.

Target system

Instruction set

Which instructions are on the target system? C operators are probably supported. Other operations are not as available even if they are required by IEEE 754. For example, fmod rarely compiles to a single instruction. It is usually done by long division, which then translates to a series of integer operations or partial remainder instructions. This C function is also an example where operators in other programming languages can be more expensive than expected, like % in JavaScript or Python.

Programming language

I suggest programming mathematical functions in C. It is fast and precise to evaluate complicated expressions with floating-point contraction. On supported platforms, a * b + c can compile to a fused multiply–add.

// Nearest integer for a nonnegative argument
float nearest(float x)
{
    const float rectifier = 0x1p23f;
    float y = x + rectifier;
    return y - rectifier;

    // Wrong: can be optimized to (x + 0)
    return x + rectifier - rectifier;
}

Rounding

Round half to even is the default rounding in IEEE 754. The roundoff error of a series of n operations by this unbiased rounding is only in O(√n). This rounding will be the implied rounding method throughout this article unless otherwise specified.

Double rounding

The default rounding is non-associative. If we round 1.49 to the nearest tenth and then to the nearest integer, it becomes 2. Rounding to midpoints must be avoided in intermediate roundings.

Round to odd is a helpful intermediate rounding for binary numbers. It rounds irrepresentable numbers to the nearest representation with an odd significand. Only numbers with even significands can be midpoints or exact in a coarser precision. Therefore, round to odd does not interfere with subsequent roundings. Round to odd is also associative like directed roundings as it rounds all values between representations to either of them.

Exact addition

We can obtain the exact error of addition with the default rounding. This technique is useful for storing precise intermediates to stop the propagation of error. The idea is to find s + e = a + b, where s is the rounded sum, and e is the error term. The error term is defined when s does not overflow.

Compensated summation produces precise results with preconditions. The base of the floating-point system (FLT_RADIX) can be at most 3, and logb(a) ≥ logb(b).

s = a + b;
e = a - s + b;

We can generalize this algorithm by comparison, as |a| ≥ |b| implies logb(a) ≥ logb(b). Branching is nevertheless inefficient. There is another unbranched algorithm working most of the time. This algorithm overflows only if an operand is the largest finite representation or its negative.

s = a + b;
x = a - s;
y = b - s;
e = (a + x) + (b + y);

Exact multiplication

First, a double is large enough to store the exact product of any two floats. Therefore, it is preferred to cast and multiply. This method is straightforward and fast, and double-precision multiplication is widely supported.

double multiply(float a, float b)
{
    return (double)a * b;
}

Then, we try to find s + e = ab. If the FMA instructions are available, use them. Probe this feature with FP_FAST_FMA[FL]? macros.

s = a * b;
e = fma(a, b, -s);

Next, without all the hardware witchcraft, we can still count on schoolbook multiplication. With an even number of significant digits, equally split them and obtain the higher part with a bitwise AND.

Even if the number of significant bits is odd, we can split a binary significand with the default rounding. Take IEEE 754 double-precision as an example, which has 53 significant bits. Its magic multiplier is 227 + 1, where 27 = (53 + 1) / 2.

double split(double x)
{
    double s = (0x1p27 + 1) * x;
    double c = x - s;

    return s + c;
}

The above function returns the higher half of the argument. We can extract the lower half by subtracting the higher half. Each half is guaranteed to have at most 26 significant bits. The possibility that the halves can have opposite signs recovers the seemingly lost bit.

Table maker’s dilemma

The cost of a correctly rounded transcendental function is unknown unless probed with brute force. Faithfully rounded versions are much faster and generally preferable, though they may round up or down. No matter how precise intermediate results are, they can be close to a turning point of the given rounding. For example, x is a mathematical result equal to x* + 0.49 ulp, where x* is an exact floating-point. An exquisite approximation gives x* + 0.51 ulp, which is only 0.02 ulp off. Nevertheless, it becomes x * + 1 ulp after default rounding, which is 1 ulp off from the correctly rounded x*.

We can correctly round an algebraic function by solving its polynomial equation at the turning point and compare the results. However, this extra cost is unwelcome if faithful rounding is enough. It is unlikely that a correctly rounded program solves a real-world problem that a faithfully rounded one does not. IEEE 754 does not require correct rounding for the cubic root. Therefore, I made cbrt faithfully rounded in metallic. Its error can be even larger in glibc and other C libraries.

Approximation

Eventually, we break down mathematical functions to basic arithmetics. This section covers how to turn mathematics into source code.

Argument reduction

Sometimes we can shrink the domain to a short interval with an identity. For example, to compute exp for a binary format, we can divide its argument x by ln 2.

\begin{align*} x &= n \ln 2 + r \\ \exp x &= 2^n \exp r \end{align*}

Where n is an integer, bit twiddling takes care of multiplication by 2n. If we pick n as an integer nearest to x, we simultaneously restrict r into [-0.5 ln 2, 0.5 ln 2].

Approximation to exp r is fast because [-0.5 ln 2, 0.5 ln 2] is a short interval. We approximate exp r with few terms to achieve the desired precision.

It is also precise because r is small. Computations involving small numbers are accurate. Floating-points are dense near the origin since they are essentially scientific notation. In IEEE 754 binary formats, there is the same number of representations in (0, 1.5) and in (1.5, ∞). Therefore, it is wise to shift the domain close to 0.

Transformations

Most mathematical functions we compute are well-behaved. We can save computations by taking advantage of continuity, differentiability, or symmetry.

When a function f passes through and is monotone at the origin, divide it by the identity function and approximate the quotient g instead.

\begin{align*} f(0) &= 0 \\ f(x) &= x g(x) \end{align*}

This transformation explicitly omits the constant term and reduces the overall relative error. The value of g(0) can be any finite number. We define g as a continuous extension for rigor.

\[ g(x) = \begin{cases} \displaystyle \frac{f(x)}{x} & \mbox{if } x \ne 0 \\ \displaystyle \lim_{t \to 0} \frac{f(t)}{t} & \mbox{if } x = 0 \end{cases} \]

Given an approximant ĝ of g, the overall absolute error x |ĝg| tends to 0 when x also approaches 0. This transformation enables approximating g without a weight function and simplifies calculation.

When f is an even function, view it as another function g of a squared variable.

\begin{align*} f(x) &= f (-x) \\ f(x) &= g \left( x^2 \right) \\ g(x) &= f \left( \sqrt x \right) \end{align*}

This transformation explicitly omits odd terms and halves the degree of the approximant.

An odd function is a combination of the above two. It is a product of the identity function and an even function.

\begin{align*} f(x) &= -f (-x) \\ f(x) &= x g \left( x^2 \right) \\ g(x) &= \frac{f \left( \sqrt x \right)}{\sqrt x} \end{align*}

The value of g(0) does not affect the approximation of g as it creates no hole in the domain. In practice, set the lower bound to a tiny positive number like 2-200, and everything is fine.

Remez algorithm

Remez exchange algorithm is an iterative minimax algorithm. It minimizes the error of a rational approximation of a function. The best explanation of this algorithm I found is from the Boost libraries. I recommend Remez.jl, a public module in the Julia language. It works out of the box after installation.

For example, the following snippet finds a cubic approximation of cos(√·) in [0, (π / 4)2] with minimax absolute errors. The last argument is 0 because we want a polynomial, whose denominator is a constant (of degree 0) if regarded as a rational function.

import Remez

N, D, E, X = Remez.ratfn_minimax(x -> cos(√x), (0, (big(π) / 4)^2), 3, 0)

The variables N, D, E, X are the numerator, the denominator, the maximum error, and coordinates of the extrema, respectively. In this case, we are interested in N and E only. If we run the snippet in the REPL, it is straightforward to inspect variables.

julia> N
4-element Array{BigFloat,1}:
  0.9999999724233229210670051040057597041917874465747537951681676248240168483719746
 -0.4999985669584884771720232450657038606385147149244782395789475085368551172067715
  0.04165502688425152443762347668780274316867072837392713367475023020736799395672903
 -0.001358590851011329858521158876238716265345398772374942259275377959127201806930143

julia> E
2.757667707893299489599424029580821255342524620485822487536621483700643103529491e-08

The resulting coefficients are in ascending order. For example, the first element of N is the constant term.

Polynomial evaluation

The best polynomial evaluation method depends on the system. For example, pipelining influences execution time. Luckily, there are well-known evaluation schemes that provide decent performance and reasonable errors.

Horner’s scheme produces the fewest operations. It is the fastest if its argument is already a vector. It is also usually the most accurate method for a polynomial approximant of a well-conditioned function. However, its dependency chain is also the longest. It underuses the pipeline because all operations except one depend on another. Hence, it is less than ideal on single-threaded systems.

On the other hand, Estrin’s scheme tries to be as parallel as possible. It groups terms in a binary fashion to achieve the shallowest dependency tree at the expense of O(log(n)) extra squaring ops.

There are also other evaluation schemes with different pros and cons. Benchmark to find the most suited one if their difference is critical.

Responder's direct cuebid

Responder’s direct cuebid is a disputed and under-discussed topic. There are two popular usages of this bid:

  • Limit raise or better
  • Generic game force

I suggest different approaches to major and minor openings.

After a major opening

The cuebid is a limit raise or better. From the pigeonhole principle, you have either of these:

  • 3-card support
  • 5-card unbid suit
  • 4-4 unbid suits
  • 4-card adverse suit

The promise of a fit clears the way for finding a game. Other calls better describe game-forcing hands without 3-card support.

  • 5-card unbid suit: free bid
  • 4-4 unbid: negative double
  • 4-card adverse suit: 3NT

After a minor opening

We make the cuebid a versatile tool to combine the advantages of popular treatments.

According to the previous section, you can have an embarrassing strong hand with no 4-card support, no biddable side suit, and no stopper in the adverse suit. Imagine holding the following hand at 1♣-(1♠)-?

♠ xxx
Axx
Axxx
♣ Axx

You could have bid 3NT if RHO did not overcall, but you cannot now because there is no spade stopper. You are wary of passing because 3NT is still playable if your partner has a spade stopper. Ask for one with 2♠.

Besides, we can keep the limit raise. Your partner is eager to show a stopper as notrump games score more than minor games. Including the limit raise in the cuebid does not affect the bidding structure.

Don't implement CMPLX with +

For brevity, let’s forget that CMPLX is a macro. The following code seems to be a cross-platform solution.

inline double _Complex CMPLX(double x, double y)
{
	return x + I * y;
}

It is not. Whether I is complex or imaginary, I * y evaluates to (+0, y) when added with a real number. If x happens to be -0, its sign is not preserved because -0 + +0 = +0.

I think we can only stick with compiler-specific constructs for now. :(

How to set charset of all text responses on nginx

All text files on a site usually share the same character encoding. Especially UTF-8 is the modern de facto standard. However, the default charset_types does not contain text/css, let alone other non-plain text types like text/markdown.

The default charset_types should be text/* because most of them are parsed in ASCII (us-ascii) by default for backward compatibility. A text/xml response is parsed in ASCII even if BOM and XML declaration tells otherwise. Therefore, we should use application/xml for XML responses now.

Nevertheless, the charset_types setting checks complete matches only. Luckily, the map directive knows regex, and charset_types accepts a variable.

map $sent_http_content_type $charset {
    ~^text/   utf-8;
}

charset       $charset;
charset_types *;

This setting would make nginx specify UTF-8 for all text responses, such as text/css; charset=utf-8.

Inverted minors considered harmful with strong notrump

I have been researching on Wbridge5, a prominent bridge program. I used to be confused that it disables inverted minors by default. Recently I came up with a conclusion.

Wbridge5 opens strong notrump by default, so this treatment is disabled. Wbridge5 still includes inverted minors because weak notrump is a choice.

Inverted minors originated from Kaplan–Sheinwold. It is popular in East and Southeast Asia because of Precision Club, a bidding system based on K-S with the strong club that inherits the weak notrump opening.

Nowadays, many players open strong notrump according to something American. However, some of them still employ inverted minors. It has pros easily found by searching “inverted minors.” Hence, I list its cons as a balance report.

Garbage 1NT response

The weakness of inverted minors is not on itself but the 1NT response adjusted by the inverted minors. The 1NT response shows either of the following:

Constructive 1NT
Expected 6+ tricks if both minimum.
Garbage 1NT
Expected only 5 tricks if both minimum.

When the partner opens 1♠, 1, or 1, not to miss a probable game, the garbage 1NT is on. Overcalls invalidate inverted minors, so their counteractions fall out of the topic.

Without inverted minors, a 1NT response to 1♣ is always constructive. Respond 2♣ with a weak 3-3-3-4 because the opener often has 4+ clubs.

If 1♣ ensures 3+ clubs
With minimum strength, the probability of mere 3 clubs is 21.5%.
If 1♣ can be 4-4-3-2
With minimum strength, the probability of mere 3 clubs is 20.4%, 4-4-3-2 5.19%.

Weak 4-card support dumped as garbage

Express 5-card support as 3 level preempts. Nevertheless, 1NT with weak 4-card support is much less preemptive. Is there so much difference between 2 of a minor and 1NT, as 1NT is just one or two bids lower? Let’s consider the following.

W N E S
  1♣ - 1NT
X1 - -2 ?

The point is not whether to escape, but the positive pass. Notrump is awful for the declarer, with 6.06 tricks taken on average. 1NTxS−3 is more tragic than 3NTE= without favorable vulnerability. Besides, the total notrump tricks may be less than 13.

If we responded 2♣ instead, east must have clubs to pass, and the lowest positive advance becomes 2NT. Preemption is force opponents to bid strong hands high. Although 2♣ is only one bid higher than 1NT, it pushes pass and cuebid onto 2NT.

  1. Takeout double 

  2. Convert to business double 

AppArmor configuration for nginx and php-fpm

AppArmor is the default MAC module on Ubuntu. Unlike DAC in Un*x, an AppArmor config lists what a process can access. An enforced process can only access listed paths; a complaining process emits warnings when accessing unlisted files.

However, there is no default config for nginx and php-fpm. To prevent the webserver from being hacked, causing systemic infection, let’s write configs on our own! The useful tool aa-genprof gets most of the jobs done, but some paths, especially sockets, are still missing. Therefore, I publish my settings as a reference.

The following is my config for nginx.

#include <tunables/global>

/usr/sbin/nginx {
	#include <abstractions/apache2-common>
	#include <abstractions/base>
	#include <abstractions/nis>

	capability dac_override,
	capability net_bind_service,
	capability setgid,
	capability setuid,

	/etc/nginx/** r,
	/etc/ssl/openssl.cnf r,
	/proc/*/auxv r,
	/run/nginx.pid rw,
	/run/nginx.pid.oldbin w,
	/run/php5-fpm.sock rw,
	/srv/www/** r,
	/usr/sbin/nginx mr,
	/var/log/nginx/* w,
}

The following is my config for php-fpm.

#include <tunables/global>

/usr/sbin/php5-fpm {
	#include <abstractions/base>
	#include <abstractions/nameservice>
	#include <abstractions/php5>

	capability kill,
	capability setgid,
	capability setuid,

	/etc/php5/** r,
	/proc/*/auxv r,
	/proc/sys/kernel/ngroups_max r,
	/run/mysqld/mysqld.sock rw,
	/run/php5-fpm.pid rw,
	/run/php5-fpm.sock w,
	/srv/www/** r,
	/srv/www/html/wp-content/** rw,
	/srv/www/html/wp-content/cache/** rwk,
	/srv/www/magento/media/** rw,
	/srv/www/magento/var/** rwk,
	/tmp/ r,
	/tmp/** rwk,
	/usr/sbin/php5-fpm mrix,
	/var/log/php5-fpm.log* w,
}