Every person who is interested in programming must write their own version of the solution to this problem. I decided not to be an exception.
In accordance with
x64 software conventions, we assume that the number to be converted is located at
We will use
x64bit code for
x32bit addressing. This method of addressing allows you to take advantage of both dialects.
We save the stack value and create a data placement point aligned to the paragraph to improve performance:
; start mov r9d, esp lea r8d,[r9d - 70h] and r8d, 0FFFFFFF0h mov esp, r8d
FPUit by freeing it from data and setting increased precision and rounding to zero:
fsave [esp] finit mov dword ptr[esp - dword], 037F0F7Fh fldcw [esp - dword]
Overloading the number from
movd qword ptr[esp - xmmword], xmm0 fld qword ptr[esp - xmmword]
Finding the decimal order of the Number:
fld st(0) fxtract fldl2t fst st(1) fdivr st(0),st(2) frndint
Setting rounding to the nearest number:
fldcw [esp - word]
We preserve the order of the Number and find the decimal order of the Multiplier for converting the significant digits of the Number to the integer part:
fist dword ptr[esp - dword] movzx edx, word ptr[esp - dword] mov dword ptr[esp - dword], 10h fisubr dword ptr[esp - dword]
Find the decimal Multiplier and multiply it by a Number:
fmulp st(1),st(0) fst st(1) frndint fsub st(1),st(0) fld1 fscale fstp st(1) fmulp st(2),st(0) f2xm1 fld1 faddp st(1),st(0) fmulp st(1),st(0) frndint
We overload the resulting number from
FPUto the and registers
XMM0in the size of the first 2 and 8 subsequent bytes, respectively. When loading 8 bytes into the register
XMM0, we simultaneously change the byte placement order by pre-aligning the stack pointer according to the paragraph:
fbstp tbyte ptr[esp - xmmword] mov ax, word ptr[esp - qword] pshuflw xmm0, xmmword ptr[esp - xmmword], 00011011b
Restoring the state
Rearrange the register bytes
ХММ0to the state of their full reversal with simultaneous doubling:
punpcklbw xmm0, xmm0 pshuflw xmm0, xmm0, 10110001b pshufhw xmm0, xmm0, 10110001b
Loading the mask and separating numeric tetrads:
mov dword ptr[esp], 0FF00FF0h pshufd xmm1, xmmword ptr[esp], 0 pand xmm0, xmm1 psrlw xmm1, 4 movdqa xmm2, xmm1 pand xmm1, xmm0 psrlw xmm1, 4 pandn xmm2, xmm0 paddb xmm1, xmm2
Create a mask and find bytes containing significant digits:
pxor xmm0, xmm0 pcmpeqb xmm0, xmm1
Converting numbers to their corresponding symbols:
mov dword ptr[esp], 30303030h pshufd xmm2, xmmword ptr[esp], 0 paddb xmm1, xmm2
Convert the first two bytes of the number to characters and store them in memory:
mov byte ptr[esp],'-' btr ax, 0Fh adc esp, 0 add ax,'.0' mov word ptr[esp], ax
Finding the length of the significant part of the number in the register
movdqu xmmword ptr[esp + word], xmm1 pmovmskb ecx, xmm0 bsf ecx, ecx add esp, ecx
Checking the order of a Number for a zero value and a negative value:
mov ecx,(word + dword) mov eax, edx neg dx jnc @f cmovns eax, edx setns dh
The value being converted is in the order of numbers to characters and store them in memory:
cmp ax, 0Ah sbb ecx, ecx mov dl, 0Ah div dl cmp al, 0Ah sbb ecx, 0 shl eax, 8 shr ax, 8 div dl add eax, 303030h lea edx,[edx * 2 + 2B51h] mov dword ptr[esp + word + ecx + word], eax mov word ptr[esp + word], dx
Calculate the length of the number and store it in the registers
@@: lea ecx,[esp + ecx + qword] sub ecx, r8d mov eax,ecx
We store a string of characters in a pair of registers
movdqa xmm1, xmmword ptr[r8d] movdqa xmm2, xmmword ptr[r8d + xmmword]
The recoverable value of the stack:
mov esp, r9d
Exiting the procedure.
In my code, I apply an undocumented Convention for passing / returning multiple parameters from a function. The agreement is exactly the same as the agreement
x64 software conventions, except that it describes the rules for placing parameters when exiting a procedure.
Why write this code if there are already ready-made solutions – because my solution is better.
What makes it better than others is that my code is direct and has no loops or branches, and also contains a minimal number of memory accesses.
Why write it in assembler if there are other more convenient languages – because assembler is better.
What makes assembler better in this case is full access to
Only part of this code is vector, and calculating a number in scalar form is not a feasible requirement, since only one number is passed to the procedure when it is called.