Converting a number to a string using the FPU – InformTFB

Converting a number to a string using the FPU

Converting a number to a string using the FPU

Every person who is interested in programming must write their own version of the solution to this problem. I decided not to be an exception.

In accordance withx64 software conventions, we assume that the number to be converted is located at XMM0.

We will use x64bit code for x32bit addressing. This method of addressing allows you to take advantage of both dialects.

We save the stack value and create a data placement point aligned to the paragraph to improve performance:

	; start
	mov    r9d, esp
	lea    r8d,[r9d - 70h]
	and    r8d, 0FFFFFFF0h
	mov    esp, r8d

We prepare FPUit by freeing it from data and setting increased precision and rounding to zero:

	fsave [esp]
	mov  dword ptr[esp - dword], 037F0F7Fh
	fldcw         [esp - dword]

Overloading the number from XMM0toFPU:

	movd qword ptr[esp - xmmword], xmm0
	fld  qword ptr[esp - xmmword]

Finding the decimal order of the Number:

	fld     st(0)
	fst     st(1)
	fdivr   st(0),st(2)

Setting rounding to the nearest number:

	fldcw	[esp - word]

We preserve the order of the Number and find the decimal order of the Multiplier for converting the significant digits of the Number to the integer part:

	fist      dword ptr[esp - dword]
	movzx edx, word ptr[esp - dword]
	mov       dword ptr[esp - dword], 10h
	fisubr    dword ptr[esp - dword]

Find the decimal Multiplier and multiply it by a Number:

	fmulp   st(1),st(0)
	fst     st(1)
	fsub    st(1),st(0)
	fstp    st(1)
	fmulp   st(2),st(0)
	faddp   st(1),st(0)
	fmulp   st(1),st(0)

We overload the resulting number from FPUto the and registers AXXMM0in the size of the first 2 and 8 subsequent bytes, respectively. When loading 8 bytes into the registerXMM0, we simultaneously change the byte placement order by pre-aligning the stack pointer according to the paragraph:

	fbstp           tbyte ptr[esp - xmmword]
	mov       ax,    word ptr[esp -   qword]
	pshuflw xmm0, xmmword ptr[esp - xmmword], 00011011b

Restoring the stateFPU:

	frstor [esp]

Rearrange the register bytes ХММ0to the state of their full reversal with simultaneous doubling:

	punpcklbw xmm0, xmm0
	pshuflw   xmm0, xmm0, 10110001b
	pshufhw   xmm0, xmm0, 10110001b

Loading the mask and separating numeric tetrads:

	mov            dword ptr[esp], 0FF00FF0h
	pshufd xmm1, xmmword ptr[esp], 0
	pand   xmm0, xmm1
	psrlw  xmm1, 4
	movdqa xmm2, xmm1
	pand   xmm1, xmm0
	psrlw  xmm1, 4
	pandn  xmm2, xmm0
	paddb  xmm1, xmm2

Create a mask and find bytes containing significant digits:

	pxor    xmm0, xmm0
	pcmpeqb xmm0, xmm1

Converting numbers to their corresponding symbols:

	mov            dword ptr[esp], 30303030h
	pshufd xmm2, xmmword ptr[esp], 0
	paddb  xmm1, xmm2

Convert the first two bytes of the number to characters and store them in memory:

	mov  byte ptr[esp],'-'
	btr             ax, 0Fh
	adc            esp, 0
	add             ax,'.0'
	mov  word ptr[esp], ax

Finding the length of the significant part of the number in the registerХММ0 :

	movdqu	      xmmword ptr[esp + word], xmm1
	pmovmskb ecx, xmm0
	bsf      ecx, ecx
	add      esp, ecx

Checking the order of a Number for a zero value and a negative value:

	mov    ecx,(word + dword)
	mov    eax, edx
	neg     dx
	jnc     @f
	cmovns eax, edx
	setns   dh

The value being converted is in the order of numbers to characters and store them in memory:

	cmp   ax, 0Ah
	sbb  ecx, ecx
	mov   dl, 0Ah
	div   dl
	cmp   al, 0Ah
	sbb  ecx, 0
	shl  eax, 8
	shr   ax, 8
	div   dl
	add eax, 303030h
	lea edx,[edx * 2 + 2B51h]
	mov dword ptr[esp + word + ecx + word], eax
	mov  word ptr[esp + word], dx

Calculate the length of the number and store it in the registers EAXandECX:

@@:	lea ecx,[esp + ecx + qword]
	sub ecx, r8d
  mov eax,ecx

We store a string of characters in a pair of registers XMM1 andXMM2:

	movdqa xmm1, xmmword ptr[r8d]
	movdqa xmm2, xmmword ptr[r8d + xmmword]

The recoverable value of the stack:

	mov esp, r9d

Exiting the procedure.

In my code, I apply an undocumented Convention for passing / returning multiple parameters from a function. The agreement is exactly the same as the agreementx64 software conventions, except that it describes the rules for placing parameters when exiting a procedure.

Why write this code if there are already ready-made solutions – because my solution is better.

What makes it better than others is that my code is direct and has no loops or branches, and also contains a minimal number of memory accesses.

Why write it in assembler if there are other more convenient languages – because assembler is better.

What makes assembler better in this case is full access to SIMD and FPU commands.

Only part of this code is vector, and calculating a number in scalar form is not a feasible requirement, since only one number is passed to the procedure when it is called.

Valery Radokhleb
Valery Radokhleb
Web developer, designer

Leave a Reply

Your email address will not be published. Required fields are marked *