Posted: 7th Oct 2021 4:32
can someone code THIS as a function for us?

early in this vid, it offers that it's triple the speed of SQRT with max 1% inaccuracy.

otherwise, i don't expect we'll get the same (any?) benefit as a function compared to it being a native AppGameKit command? if not, i'll post a request on the github.

and, if you're not familiar with the "original" code (in both links), don't click if certain "language" might offend.

meanwhile, it doesn't save much (~5%), but if i have tons of distance() calls per loop, this:
+ Code Snippet
ThatDistance# = 5.0

do
    If GetRawKeyState(27) then Exit    

	StartTime# = Timer()
		For x = 1 to 1000000
			ThisDistance# = SQRT( (10.0-5.0)^2 + (10.0-5.0)^2 )
			If ThatDistance# = ThisDistance# then Whatever()
		Next x
	D# = Timer() - StartTime#

	StartTime# = Timer()
		For x = 1 to 1000000
			ThisDistance# =  (10.0-5.0)^2 + (10.0-5.0)^2
			If ThatDistance#^2 = ThisDistance# then Whatever()
		Next x	
	F# = Timer() - StartTime#

    Print( D# )
    Print( F# )
    Print( F#/D# )
    Sync()
loop

Function Whatever()

EndFunction


any better solutions vs SQRT out there (other than the title of this thread)?
Posted: 7th Oct 2021 5:02
The idea for all those methods is to redefine a float as an integer. As far as i can tell that can only be done using a memblock which might mitigate your speed increase.
One thing we used to do is compare distance (without sqrt) to the distance^2
Posted: 7th Oct 2021 5:35
One thing we used to do is compare distance (without sqrt) to the distance^2

already doing that (2nd for/next in the code above).

i wouldnt mind a win plugin, then. but, that should be just as easy (read "no more difficult") to implement natively, right?
Posted: 7th Oct 2021 10:07
As Blink mentioned the real problem here is the conversion of a binary representation of a float into an integer.
But if you are just looking for the fastest native way of performing a distance check then you should lose the [size=large]^[/size] it is almost is bad as a [size=large]SQRT[/size]

Try this:
+ Code Snippet
ThatDistance# = 5.0
 
do
    If GetRawKeyState(27) then Exit   
 
    StartTime# = Timer()
        For x = 1 to 1000000
            ThisDistance# = SQRT( (10.0-5.0)^2 + (10.0-5.0)^2 )
            If ThatDistance# = ThisDistance# then Whatever()
        Next x
    D# = Timer() - StartTime#
 
    StartTime# = Timer()
        For x = 1 to 1000000
            ThisDistance# =  (10.0-5.0)^2 + (10.0-5.0)^2
            If ThatDistance#^2 = ThisDistance# then Whatever()
        Next x  
    F# = Timer() - StartTime#
    
    `******   Start of Scraggles code   ******
    StartTime# = Timer()
        For x = 1 to 1000000
            ThisDistance# =  (10.0*10.0 - 5.0*5.0) + (10.0*10.0 - 5.0*5.0)
            If ThatDistance# * ThatDistance# = ThisDistance# then Whatever()
        Next x  
    X# = Timer() - StartTime#
	`******   End of Scraggles code   ******
	
    Print( D# )
    Print( F# )
    Print( X# )
    Print( F#/D# )
   
    Sync()
loop
 
Function Whatever()
 
EndFunction
Posted: 7th Oct 2021 19:13
Squaring the distance for comparison will be faster than using the inverse sqrt. So unless you need the actual distance for other calculations, I don't see a need for it.
Posted: 7th Oct 2021 19:57
early in this vid, it offers that it's triple the speed of SQRT with max 1% inaccuracy.


As encouraging as that might seem., unless you're ignoring support for the Vector Unit (SSE / VMX) then you're not going to see a such a dramatic improvement in performance.
This is because the Vector Unit (CU, SSE, VMX) have (since 2005) included a "Special Instruction" Path for Sqrt or Exponent Instructions.

I think this is showcased well with Scraggle' example code... as that in DBP (which doesn't use SSE for it's Maths Functions, for understandable reasons), his approach would provide up to 6.0 - 7.0x Faster Execution.
Compare this to AppGameKit Script., where we're instead seeing it perform at 0.6 - 0.7x what Sqrt does.
That's the difference of Vector Accelerated Maths.

And as noted in your original post., the Fast Inverse Square (approx.) was when Quake 3 released., just an incredibly performance uplift (up to 3.0x Faster).
But again keep in mind that most Maths Libraries at the time didn't support SSE., which had only been implemented in 1997 with the Pentium 2 Processors... and Pentium III didn't use a "New" Version; on top of that AMD Processors had 3DNow! (their own version of Maths Extensions) and Cyrix had MX (which was a Cutdown implementation of VMX that had been proposed for PowerPC Processors., this makes more sense when you realise Cyrix was bought by IBM who produced PowerPC)

As such the Maths Libraries at the time only supported the x86 FPU Instructions., which were slow; and thus a Fast Approx. was useful to support all Processor Architectures.
Today however... SSE is a Standard Feature., or at the very least is Translated into a Native Vector Instruction Set (cough:Ryzen:cough)
Because these have dedicated and accelerated Instruction Paths for things like Log, Sqrt, Mod, Pow, etc. well this means it's generally going to be better to just use the built-in Maths Functions; as they're going to be the fastest execution path.
Posted: 7th Oct 2021 21:11
The algorithm accepts a 32-bit floating-point number as the input and stores a halved value for later use. Then, treating the bits representing the floating-point number as a 32-bit integer, a logical shift right by one bit is performed and the result subtracted from the number 0x5F3759DF, which is a floating point representation of an approximation of {\displaystyle {\sqrt {2^{127}}}}{\displaystyle {\sqrt {2^{127}}}}.[3] This results in the first approximation of the inverse square root of the input. Treating the bits again as a floating-point number, it runs one iteration of Newton's method, yielding a more precise approximation.
Posted: 7th Oct 2021 22:10
If you populate a table of all the locations of your enemies you might be able to implement it in a shader
Posted: 8th Oct 2021 13:09
lose the ^

thanks. will do.

If you populate a table

doing that already for some kinda culling.

implement it in a shader

ah. i'll have a look at this. might be ideal?
Posted: 9th Oct 2021 6:40
This is more of just a side note. One approach that has helped me in trying to squeeze every bit of performance out of AppGameKit that I can has been minimizing the number of operations in loops, rather than just contending with the type of operators. If you can do that wherever possible, you might be able to gain a few extra ticks here and there. And when combined across a codebase, such an approach can make a big overall difference. Just to illustrate with your example above, if we move the number of operations outside of the loop, but leave the core critical operations inside, we can gain nearly 200% performance:

+ Code Snippet
    StartTime# = Timer()
        posa#=(10.0*10.0 - 5.0*5.0)
        posb#=(10.0*10.0 - 5.0*5.0)
        For x = 1 to 1000000
            ThisDistance# =  sqrt(posa# - posb#)
            If ThatDistance# * ThatDistance# = ThisDistance# then Whatever()
        Next x
    Y# = Timer() - StartTime#


While this doesn't necessary solve the issue of performing any required operations within a loop, if you can move any fixed calculations to outside any nested loop everywhere you can, such an approach can achieve very solid gains in performance as a whole.
Posted: 19th Nov 2021 12:36
Fast inverse sqr was faster so long ago.

I tested it in x86 asm;
+ Code Snippet
Function _SQRT_(_F_ As _REAL_) As _REAL_
    Dim As _REAL_ X2 = Any, Y = Any, ThreeHalves = Any
    ThreeHalves = 1.5
    X2 = 0.5
    Asm
        mov eax, [ebp + 8]
        shr eax, 1
        mov ebx, &H5F3759DF
        sub ebx, eax
        mov [Y], ebx
        
        fld DWORD PTR [X2]
        fmul DWORD PTR [ebp + 8]
        fstp DWORD PTR [X2]
        fld1
        fld DWORD PTR [Y]
        fld DWORD PTR [ThreeHalves]
        fld DWORD PTR [Y]
        fmul ST(0)
        fmul DWORD PTR[X2]
        fsubp ST(1), ST(0)
        fmulp ST(1), ST(0)
        fdivp ST(1), ST(0)
        fstp DWORD PTR [ebp - 4]
    End Asm
End Function


versus;
+ Code Snippet
    Asm
        fld DWORD PTR [A]
        fsqrt
        fstp DWORD PTR [B]
    End Asm




Ofcourse it gets worse in AGK;
+ Code Snippet
// Project: Inverse Sqrt 
// Created: 2021-11-19

// show all errors
SetErrorMode(2)

// set window properties
SetWindowTitle( "Inverse Sqrt" )
SetWindowSize( 1024, 768, 0 )
SetWindowAllowResize( 1 ) // allow the user to resize the window

// set display properties
SetVirtualResolution( 1024, 768 ) // doesn't have to match the window
SetOrientationAllowed( 1, 1, 1, 1 ) // allow both portrait and landscape on mobile devices
SetSyncRate( 30, 0 ) // 30fps instead of 60 to save battery
SetScissor( 0,0,0,0 ) // use the maximum available screen space, no black borders
UseNewDefaultFonts( 1 ) // since version 2.0.22 we can use nicer default fonts

Global Mem As Integer
Global _WEIRDO_ As Integer
Mem = Creatememblock(4)
_WEIRDO_ = 0x5F3759DF




Local q As Float, s As Float
Local A As Float, B As Float
Local l As Integer

Local Time0 As Integer, Time1 As Integer

q = Random()
s = Random()
q = q/s

Time0 = Timer() * 1000
For l = 0 To 999999
	s = Sqrt(q)
Next
Time0 = Abs(Time0 - Timer() * 1000)

Time1 = Timer() * 1000
For l = 0 To 999999
	s = _SQRT_(q)
Next
Time1 = Abs(Time1 - Timer() * 1000)

q = 0
do
    If GetRawKeyState(65)
		If l
			q = Random()
			s = Random()
			q = q/s
			A = Sqrt(q)
			B = _SQRT_(q)
		EndIf
		l = 0
	Else
		l = 1
	EndIf

    Print("AGK Sqrt took...: " + Str(Time0) + " miliseconds...")
    Print("_SQRT_ took.....: " + Str(Time1) + " miliseconds...")
    Print("")
    Print("The number........: " + Str(q))
    Print("Square root.......: " + Str(A))
    Print("_SQRT_............: " + Str(B))
    Sync()
loop
Deletememblock(Mem)


Function _SQRT_(_F_ As Float)
	Local h As Float
	SetMemblockFloat(Mem, 0, _F_)
	j = GetMemblockInt(Mem, 0)
	j = _WEIRDO_ - (j >> 1)
	SetMemblockInt(Mem, 0, j)
	h = GetMemblockFloat(Mem, 0)
	h = h * (1.5 - (_F_ * h * h * 0.5))
	h = 1/h
EndFunction h


maybe you should try making your distance comparisions in 'squared_length's so you will get rid of sqrt?