early in this vid, it offers that it's triple the speed of SQRT with max 1% inaccuracy.
As encouraging as that might seem., unless you're ignoring support for the Vector Unit (SSE / VMX) then you're not going to see a such a dramatic improvement in performance.
This is because the Vector Unit (CU, SSE, VMX) have (since 2005) included a "Special Instruction" Path for Sqrt or Exponent Instructions.
I think this is showcased well with Scraggle' example code... as that in DBP (which doesn't use SSE for it's Maths Functions, for understandable reasons), his approach would provide up to 6.0 - 7.0x Faster Execution.
Compare this to AppGameKit Script., where we're instead seeing it perform at 0.6 - 0.7x what Sqrt does.
That's the difference of Vector Accelerated Maths.
And as noted in your original post., the Fast Inverse Square (approx.)
was when Quake 3 released., just an incredibly performance uplift (up to 3.0x Faster).
But again keep in mind that most Maths Libraries at the time didn't support SSE., which had only been implemented in 1997 with the Pentium 2 Processors... and Pentium III didn't use a "New" Version; on top of that AMD Processors had 3DNow! (their own version of Maths Extensions) and Cyrix had MX (which was a Cutdown implementation of VMX that had been proposed for PowerPC Processors., this makes more sense when you realise Cyrix was bought by IBM who produced PowerPC)
As such the Maths Libraries at the time only supported the x86 FPU Instructions., which were slow; and thus a Fast Approx. was useful to support all Processor Architectures.
Today however... SSE is a Standard Feature., or at the very least is Translated into a Native Vector Instruction Set (cough:Ryzen:cough)
Because these have dedicated and accelerated Instruction Paths for things like Log, Sqrt, Mod, Pow, etc. well this means it's generally going to be better to just use the built-in Maths Functions; as they're going to be the fastest execution path.