What MrKlorox wrote - we had to go back to an older compiler, because even with all the AVX instructions enabled the new one performed worse than the old one with SSE2 only!.
We're currently waiting for feedback about the performance (it should be back to how it was before we made the switch). If that's confirmed, we will enable AVX512 in some libraries that we're using - and we've already seen a performance improvement due to that (in fact, that was the reason why we initially didn't realize that the new compiler had issues; the library was so much faster that the total performance was still slightly better than before).
Aside from this, we have also discovered - oopsy - that the compiler that we're using spits out AVX instructions, but THEY ARE NEVER EXECUTED on AMD CPU's - they have always only used the SSE2 code.... So, we might need to create separate binaries for AMD. But, that's for later. We're still in the process of migrating everything to the new compiler and build structure, and making sure that everything still works.
By the way, enabling AVX512 for everything might be an option as well. But that will require going through a lot of calculations, because it means that a lot of numbers suddenly need to be multiples of 16 instead of 8, which could negatively impact performance for certain things. So it's not really trivial. And we have a lot of hand-written intrinsics code, I don't really want to spend too much time on manually creating separate code paths for AVX512... Maybe for some. But again, later.
|