update: actually, i was pointed at this article: https://blogs.msdn.microsoft.com/vcblog/2015/10/19/do-you-prefer-fast-or-pre... it all about Microsoft compiler, but underlaying problem should be the same for all compilers. It even says _sometimes_ auto-vectorization can produce more correct results! --------------------quote--------- Counter Example The explanations so far would lead you to expect that /fp:fast will sometimes (maybe always?) produce a result that is less accurate than /fp:precise. As a simple example, let’s consider the sum of the first million reciprocals, or Sum(1/n) for n = 1..1000000. I calculated the approximate result using floats, and the correct result using Boost’s cpp_dec_float (to a precision of 100 decimal digits). With /O2 level of optimization, the results are: float /fp:precise 14.3574 float /fp:fast 14.3929 cpp_dec_float<100> 14.39272672286 So the /fp:fast result is nearer the correct answer than the /fp:precise! How can this be? With /fp:fast the auto-vectorizer emits the SIMD RCPPS machine instruction, which is both faster and more accurate than the DIVSS emitted for /fp:precise. This is just one specific case. But the point is that even a complete error analysis won’t tell you whether /fp:fastis acceptable in your App – there’s more going on. The only way to be sure is to test your App under each regime and compare answers. ----------------------quote end-------- --------- В сообщении от Tuesday 12 February 2019 12:17:13 Andrea paz написал(а):
Thank you, Adrian. You have provided a lot of news and all interesting. I am not competent for compilations and coding, but it seems that we have almost reached the intrinsic limits of which the group of Lumiera spoke (IMO): https://www.lumiera.org/project/background/history/CinelerraWoes.html CFLAGS: In the Arch wiki they advise against the -03 option because it brings instability and there are times when it is slower than -02. But your results aren't bad. Can I apply these CFLAGS options to my Arch? Are they general or do they only apply to Slackware? Can I try vectorization with -ffast-math or are there any other ways that would advise against it to an incompetent like me?