Visual C++ 编译选项 /QIfist 的一处 bug - dwing吧

level 13

dwing 楼主

Visual C++ 开启 /QIfist 选项可以提高浮点数取整的速度, 精度可能会有影响, 但今天却碰巧发现了一处很严重的问题.
下面两行代码应该都显示 -1.0 才对, 但如果开启 /QIfist, 前者是
正确的
, 而后者会输出 -2.0. 我想这不是精度的问题, 而是bug才对.
printf("%f\n", (float)(int)(-1.5));
printf("%f\n", (float)(int)atof("-1.5"));
已测试 VC6, VC2003, VC2010 均能重现 (注意在编译选项中加入参数"/QIfist")

2009年12月11日 10点12分 1

level 13

dwing 楼主

/QIfist 确实尽量能不用就不用,并不是误差的问题,而是和浮点状态有关. VC2010已经对此选项提示警告了.

2009年12月11日 11点12分 2

level 13

dwing 楼主

其实把1楼的 1.5 改成 1.6, 1.99, ... 结果都不变,反汇编都看不到内联fistp指令,确实是bug,估计微软只能回答不要用 /QIfist 了.
VC2005以后memcpy和memset的内联被取消很可惜,只能自行写个内联+汇编实现的版本,但没法很好地优化.
SSE/SSE2的取整我不确定能否真正内联.

2009年12月11日 13点12分 4

level 13

dwing 楼主

truncate(向0取整)是很有用的,很多情况取整不是为了精确取值.

2009年12月11日 13点12分 5

level 13

dwing 楼主

纠正一下4楼,fistp指令有内联,确实默认不是trancate,只要之前调用一个函数即可解决:
_controlfp(_MCW_RC, _RC_CHOP);

2009年12月11日 14点12分 7

level 13

dwing 楼主

我使用VS2005编译并反汇编,没有发现内联取整函数,即使加入/fp:fast,/arch:XXX都是嵌入ftol函数.

2009年12月11日 14点12分 8

level 13

dwing 楼主

嗯...看起来1楼的问题也不算是bug,只是编译时和运行时的浮点取整模式不同,编译时是截断,而运行时ftol也是截断,但fist默认是最接近,看来要用/QIfist,至少在入口开始处调用controlfp设置一下.

2009年12月11日 14点12分 9

level 13

dwing 楼主

开 /Oi /Ob2 /arch:SSE2 也无效,只要不加/QIfist参数,浮点取整就是调用CRT里的ftol函数,而且是用st0作为参数传入的.

2009年12月11日 15点12分 11

level 13

dwing 楼主

VC2005命令行:
/O2 /Ob2 /Oi /Ot /Oy /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /GF /FD /MD /GS- /arch:SSE2 /fp:fast /Fo"Release\\" /Fd"Release\vc80.pdb" /W3 /nologo /c /Wp64 /Zi /TP /errorReport:prompt
printf("%f\n", (float)(int)(-1.9));
printf("%f\n", (float)(int)atof("-1.9"));
00401000 fld　　 qword ptr [402100]
00401006 push　　esi
00401007 mov　　 esi,[<&MSVCR80.printf>] ;　MSVCR80.printf
0040100D sub　　 esp,8　　　　　　　　　 ; /<%f>
00401010 fstp　　qword ptr [esp]　　　　 ; |
00401013 push　　004020F4　　　　　　　　; |format = "%f",LF,""
00401018 call　　esi　　　　　　　　　　 ; \printf
0040101A push　　004020F8　　　　　　　　; /s = "-1.9"
0040101F call　　[<&MSVCR80.atof>]　　　 ; \atof
00401025 call　　00401820　　　　　　　　; 这就是ftol函数
0040102A cvtsi2ss xmm0,eax
0040102E add　　 esp,8
00401031 prefix rep:
00401032 db　　　0F
00401033 pop　　 edx
00401034 sal　　 dl,0F
00401037 adc　　 [esp],eax
0040103A push　　004020F4　　　　　　　　;　ASCII "%f",LF
0040103F call　　esi
00401041 add　　 esp,0C

2009年12月12日 03点12分 13

level 13

dwing 楼主

嗯...float到int的转换确实是内联的,但double到int是ftol()

2009年12月12日 04点12分 14

level 13

dwing 楼主

使用"多线程(静态库)"仍然不内联.
/O2 /Ob2 /Oi /Ot /Oy /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /GF /FD /MT /GS- /arch:SSE2 /fp:fast /Fo"Release\\" /Fd"Release\vc80.pdb" /W3 /nologo /c /Wp64 /Zi /TP /errorReport:prompt
00401000 fld　　 qword ptr [40F3B0]
00401006 sub　　 esp,8
00401009 fstp　　qword ptr [esp]
0040100C push　　0040F3A0　　　　　 ;　ASCII "%f",LF
00401011 call　　0040104A　　　　　 ;　printf()
00401016 push　　0040F3A4　　　　　 ;　ASCII "-1.9"
0040101B call　　0040123A　　　　　 ;　atof()
00401020 call　　0040D3D0　　　　　 ;　ftol()
00401025 cvtsi2ss xmm0,eax
00401029 add　　 esp,8
0040102C prefix rep:
0040102D db　　　0F
0040102E pop　　 edx
0040102F sal　　 dl,0F
00401032 adc　　 [esp],eax
00401035 push　　0040F3A0　　　　　 ;　ASCII "%f",LF
0040103A call　　0040104A　　　　　 ;　printf()
0040103F add　　 esp,0C

2009年12月12日 06点12分 16

level 13

dwing 楼主

这是VC2005的ftol():
0040D3D0 cmp　　 dword ptr [411F0C],0
0040D3D7 je　　　short 0040D406　　 ;　不支持SSE2就跳到普通版的ftol()
0040D3D9 push　　e
bp

0040D3DA mov　　 ebp,esp
0040D3DC sub　　 esp,8
0040D3DF and　　 esp,FFFFFFF8
0040D3E2 fstp　　qword ptr [esp]
0040D3E5 prefix repne:
0040D3E6 cvttps2pi mm0,[esp]
0040D3EA leave
0040D3EB retn

2009年12月12日 06点12分 17

level 13

dwing 楼主

OD对SSE2指令的显示还有不少问题

2009年12月12日 06点12分 18

level 13

dwing 楼主

确实很奇怪,atof()的返回值取整就调用了_ftol2_sse

2009年12月12日 07点12分 20

level 13

dwing 楼主

用/O2代替/Ox还是很有必要的
/TC 是强制C语言编译,一般不需要
/GR- 也没什么意义,不使用运行时类型信息时不会影响输出.
/Zp4 太极端了,默认就行

2009年12月12日 10点12分 22

level 13

dwing 楼主

记错了, /Zp4 是4字节对齐,不极端,但还是默认比较好,对于__int64和double类型的,默认会使用8字节对齐.
我尝试用21楼的参数编译,还是调用了_ftol2_sse

2009年12月12日 10点12分 23