N64 Trigonometry: The Folded Polynomial

I've acquired some amount of internet fame for my work in assembly programming and low-level optimization.

Trig functions, like sine and cosine, are used widely in games programming for important physics and graphics calculations. Computing approximations to these functions that are both

  1. accurate, and
  2. performant

is a mostly solved problem on modern hardware, but for some older CPUs with extreme performance constraints, it poses a major challenge.

After learning this was an open problem, I invented a new algorithm for computing sine and cosine approximations that achieved over 90 times improved accuracy compared to the state of the art on the VR4300, without sacrificing any performance. It takes advantage of several unique features of this CPU and its instruction set architecture (e.g. branch delay slots and no branch predictor), as well as clever use of modular arithmetic.

I then collaborated with famed N64 modder Kaze Emanuar to implement this algorithm in his new engine. His video covers the saga in detail. If you're interested, I'd encourage you to check it out! I'm quite passionate about quality educational content, and he does a great job explaining an otherwise impenetrable topic (low-level programming) to a broad audience.

After this initial collaboration, I later invented an improved arctangent approximation algorithm and implemented it as well. Video forthcoming.