| |
| |
| |
|
Comments:
<0> is there any gcc 4.x.x win32 binaries ? <1> anybody around? <2> Hi, I try to create a cross toolchain for ppc64, but it fails with the message "crti.o" not found (configured with --target=ppc64-unknown-linux-gnu --prefix=$PWD/bla --without-headers --enable-languages=c --with-sysroot=/home/ness/tmp/os/xenv/sysroot-ppc64 --with-newlib --disable-threads) <3> hi <4> Hello. I have recently transitioned to a system using GCC 4.0.2. Some image processing code in my program accounts for a high percentage of the runtime and is mostly written out in SSE intrinsics. On the 64 bit cluster nodes (Opteron 285) gcc4 has improved performance more than 10%. On my 32b development workstation (3.2 GHz pre-Nocona Xeon) processing takes more than twice as long.
<4> I am wondering if this is to be expected, and if there is anything anyone would suggest I could do to improve the performance on the 32b Xeon. Previously, I was using gcc3.4. <4> I have read often of gcc4 performing worse than gcc3.4, but this seems to be a very extreme case. The 32b gcc4 version is terrible (e.g., 22.7seconds versus 10.7 seconds for the same task) and the 64b version is good. <4> If this is the wrong channel for such issues please direct me elsewhere, and sorry for the trouble. <4> If anyone is interested, the same parameters were used for both compilers. fast math, O3, march={opteron or pentium4}. <4> The 32b code also performs even more horribly when I tested on a 3.6 GHz Nocona machine. It went from processing faster than my old 3.2GHz with the old binary to being slower than mine with the horrible ia32 gcc4 version. <4> Given that so large a percentage of the program is spent in a tight loop written in intrinsics I could pretty easily look at the ***embly generated to see what it's doing that's so horrible. I don't think that would give me any hints as to how to fix it though. <4> In gcc3.4 the ***embler output was basically an exact translation of what I had written in intrinsics. <4> There is still a straight C code path, I might try compiling with that and doing a comparison. If gcc4 isn't horrible in that comparison then I guess it just craptastic code generation for SSE/2 intrinsics. I might as well also look at the ***embler, quite a minor change for the worse could easily double runtime. <4> This is quite a talkative channel. <4> Maybe I should just write in ***embler. <3> smcmillan: lol :) <3> but all this sounds interesting, though <3> could you put all together into some website ? <3> also, more detail (your code, your CFLAGS, etc) is necessary <3> and: have you performed some profiling on your code ? <4> Hello. Yes, I have profiled the program extensively. It spends about 90% of its time in one function which is written out mostly in SSE1/2 intrinsincs. <4> This fraction is higher with gcc4, because it does a worse job.. <4> (i.e., the rest of the program doesn't perform worse, or at least not as much worse). <4> It is looking like the SSE register allocation is poor, resulting in a situation where a value that was previously kept inside a register during a loop incurring many load/stores. I haven't looked, but I suppose the fact that there are twice as many SSE registers available in x86-64 might make this a non-issue (until you had some code that was almost spilling out of registers on x86-64).
<4> I would need to now go and look at the code generated for 64b mode. I will try to change the way my intrinsics code is written to get gcc4 to do a better translation to ***embly. <4> I did have this written in ***embler, but I wanted to use intrinsics because it is easier to read and maintain, it WAS just as fast and producing ***embler basically identical to what I had written, and it also allows the compiler to potentially take advantage of extra registers when compiled for x86-64. <4> I would appreciate any tips anyone could offer on changes to the compiler optimisation parameters for gcc4 that might get it to work better. At the moment I am just going to try getting the intrinsics to convert better (somehow). <4> Current relevant parameters are -ffast-math -fomit-frame-pointer -O3 -march={pentium4,opteron} <5> hey, i managed to write few lines of bad code that crashes gcc (4.1.1). should i, i dunno, post it somewhere or send it here or what? <5> it's incorrect in so many levels that it most likely just doesn't matter in any way :P <6> crashes are always not nice. I think there is a number of them currently fixed, someone tries some funny replaces to generate invalid code that crashes often. <6> if there is any chance that is it not yet known, it is always a good idea to report it. <5> ah, i cut it as small as i could: www.ut.ee/~a51081/crash.cpp <6> Jaak: I do not know if any gcc developers are listening here. perhaps it is best to report it to gcc's bugzilla. <5> i hate bugzillas -.- <5> wtf are "host triplet", "target triplet" and "build triplet"? <5> o_O <6> Jaak: host target and build are the names for (in some permutation I can never remember) the system you build on, the system you build for, and the system the compiler you build will build for <6> and I guess triplet refers to the three items you give when referening to a system <5> oh, thanks <5> ooh, even smaller namespace Test { cl*** Test; } Test::~Test() {} <5> anyhoo, there goes the report :P <5> hah, it's a dupe :)
Return to
#gcc or Go to some related
logs:
how to remove security from pdf 550Mhz lighttpd #suse php-cgi Accelerator gentoo gentoo uninstall screensaver #perl ich6r ubuntu #perl #bash #css
|
|