Thread View: gmane.comp.gcc.patches
6 messages
6 total messages
Started by Tejas Belagod
Fri, 21 Feb 2014 16:30
[Patch, AArch64] Fix shuffle for big-endian.
Author: Tejas Belagod
Date: Fri, 21 Feb 2014 16:30
Date: Fri, 21 Feb 2014 16:30
104 lines
3563 bytes
3563 bytes
This is a multi-part message in MIME format. --------------020100030405060301050609 Content-Type: text/plain; charset=WINDOWS-1252; format=flowed Content-Transfer-Encoding: quoted-printable Hi, When a shuffle of more than one input happens, on NEON we end up with a 'mixed-endian' format in the register list which TBL operates on. We don't make this correction in RTL and therefore the shuffle operation gets it incorrect. Here is a patch that fixes-up the index table in the selector rtx in RTL to also be mixed-endian to reflect what's happening on NEON. As trunk stands, this patch will not be exercised as constant vector permute for Big-endian is disabled. I've tested this by locally enabling const vec_perm and it fixes the some regressions we have on big-endian: aarch64_be-none-elf: FAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -fomit-frame-pointer FAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions FAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -fomit-frame-pointer -funroll-loops FAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -g FAIL->PASS: gcc.dg/torture/vector-shuffle1.c -O0 execution test FAIL->PASS: gcc.dg/torture/vshuf-v16qi.c -O2 execution test FAIL->PASS: gcc.dg/torture/vshuf-v2df.c -O2 execution test FAIL->PASS: gcc.dg/torture/vshuf-v2di.c -O2 execution test FAIL->PASS: gcc.dg/torture/vshuf-v2sf.c -O2 execution test FAIL->PASS: gcc.dg/torture/vshuf-v2si.c -O2 execution test FAIL->PASS: gcc.dg/torture/vshuf-v4sf.c -O2 execution test FAIL->PASS: gcc.dg/torture/vshuf-v4si.c -O2 execution test FAIL->PASS: gcc.dg/torture/vshuf-v8hi.c -O2 execution test FAIL->PASS: gcc.dg/torture/vshuf-v8qi.c -O2 execution test FAIL->PASS: gcc.dg/vect/vect-114.c -flto -ffat-lto-objects execution test FAIL->PASS: gcc.dg/vect/vect-114.c execution test FAIL->PASS: gcc.dg/vect/vect-15.c -flto -ffat-lto-objects execution test FAIL->PASS: gcc.dg/vect/vect-15.c execution test Also regressed on aarch64-none-elf. OK for stage-1? Thanks, Tejas. 2014-02-21 Tejas Belagod <tejas.belagod@arm.com> gcc/ * config/aarch64/aarch64.c (aarch64_evpc_tbl): Fix index vector for big-endian when dealing with more than one input shuffle vector. --------------020100030405060301050609 Content-Type: text/plain; name=tbl-1.txt Content-Transfer-Encoding: quoted-printable Content-Disposition: inline; filename="tbl-1.txt" diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ea90311..fd473a3 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -8128,7 +8128,28 @@ aarch64_evpc_tbl (struct expand_vec_perm_d *d) return false; for (i = 0; i < nelt; ++i) - rperm[i] = GEN_INT (d->perm[i]); + { + int nunits = GET_MODE_NUNITS (vmode); + int elt = d->perm[i]; + + /* If two vectors, we end up with a wierd mixed-endian mode on NEON. */ + if (BYTES_BIG_ENDIAN) + { + if (!d->one_vector_p && d->perm[i] & nunits) + { + /* Extract the offset. */ + elt = d->perm[i] & (nunits - 1); + /* Reverse the top half. */ + elt = nunits - 1 - elt; + /* Offset it by the bottom half. */ + elt += nunits; + } + else + elt = nunits - 1 - d->perm[i]; + } + + rperm[i] = GEN_INT (elt); + } sel = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, rperm)); sel = force_reg (vmode, sel); --------------020100030405060301050609--
Re: [Patch, AArch64] Fix shuffle for big-endian.
Author: Alan Lawrence
Date: Wed, 12 Mar 2014 14:52
Date: Wed, 12 Mar 2014 14:52
99 lines
3620 bytes
3620 bytes
I've been doing some local testing using this patch as a basis for some of my own work on NEON intrinsics, and it seems good to me. A couple of points: (1) Re. the comment that "If two vectors, we end up with a wierd mixed-endian mode on NEON": firstly "wierd" should be spelt "weird"; secondly, if I understand right, this comment belongs with the next "if (!d->one_vector_p...)" rather than the "if (BYTES_BIG_ENDIAN)" before which it's written. (2) as you say, this code is not exercised, unless you do something to remove the 'if (BYTES_BIG_ENDIAN) return false;' earlier in that same function. Can I politely suggest you do that here in this patch? (3) In my own regression testing, with const_vec_perm enabled on big_endian, I see 2*PASS->FAIL, namely gcc.dg/vect/vect-114.c scan-tree-dump-times vect "vectorized 0 loops" 1 gcc.dg/vect/vect-114.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 0 loops" 1 These are essentially noise, but the noise is removed and I see no other problems, if (after this patch) I re-enable the testsuite's "vect_perm" target selector for aarch64 big-endian (testsuite/lib/target-supports.exp). Would you like a separate patch for that or roll it in here? Cheers, Alan Tejas Belagod wrote: > > Hi, > > > > When a shuffle of more than one input happens, on NEON we end up with a > > 'mixed-endian' format in the register list which TBL operates on. We don't make > > this correction in RTL and therefore the shuffle operation gets it incorrect. > > Here is a patch that fixes-up the index table in the selector rtx in RTL to also > > be mixed-endian to reflect what's happening on NEON. > > > > As trunk stands, this patch will not be exercised as constant vector permute for > > Big-endian is disabled. I've tested this by locally enabling const vec_perm and > > it fixes the some regressions we have on big-endian: > > > > aarch64_be-none-elf: > > FAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -fomit-frame-pointer > > FAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -fomit-frame-pointer > > -funroll-all-loops -finline-functions > > FAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -fomit-frame-pointer > > -funroll-loops > > FAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -g > > FAIL->PASS: gcc.dg/torture/vector-shuffle1.c -O0 execution test > > FAIL->PASS: gcc.dg/torture/vshuf-v16qi.c -O2 execution test > > FAIL->PASS: gcc.dg/torture/vshuf-v2df.c -O2 execution test > > FAIL->PASS: gcc.dg/torture/vshuf-v2di.c -O2 execution test > > FAIL->PASS: gcc.dg/torture/vshuf-v2sf.c -O2 execution test > > FAIL->PASS: gcc.dg/torture/vshuf-v2si.c -O2 execution test > > FAIL->PASS: gcc.dg/torture/vshuf-v4sf.c -O2 execution test > > FAIL->PASS: gcc.dg/torture/vshuf-v4si.c -O2 execution test > > FAIL->PASS: gcc.dg/torture/vshuf-v8hi.c -O2 execution test > > FAIL->PASS: gcc.dg/torture/vshuf-v8qi.c -O2 execution test > > FAIL->PASS: gcc.dg/vect/vect-114.c -flto -ffat-lto-objects execution test > > FAIL->PASS: gcc.dg/vect/vect-114.c execution test > > FAIL->PASS: gcc.dg/vect/vect-15.c -flto -ffat-lto-objects execution test > > FAIL->PASS: gcc.dg/vect/vect-15.c execution test > > > > Also regressed on aarch64-none-elf. > > > > OK for stage-1? > > > > Thanks, > > Tejas. > > > > 2014-02-21 Tejas Belagod <tejas.belagod@arm.com> > > > > gcc/ > > * config/aarch64/aarch64.c (aarch64_evpc_tbl): Fix index vector for > > big-endian when dealing with more than one input shuffle vector. > >
Re: [Patch, AArch64] Fix shuffle for big-endian.
Author: Richard Henderso
Date: Mon, 24 Mar 2014 11:25
Date: Mon, 24 Mar 2014 11:25
26 lines
674 bytes
674 bytes
On 02/21/2014 08:30 AM, Tejas Belagod wrote: > + /* If two vectors, we end up with a wierd mixed-endian mode on NEON. */ > + if (BYTES_BIG_ENDIAN) > + { > + if (!d->one_vector_p && d->perm[i] & nunits) > + { > + /* Extract the offset. */ > + elt = d->perm[i] & (nunits - 1); > + /* Reverse the top half. */ > + elt = nunits - 1 - elt; > + /* Offset it by the bottom half. */ > + elt += nunits; > + } > + else > + elt = nunits - 1 - d->perm[i]; > + } Isn't this just elt = d->perm[i] ^ (nunits - 1); all the time? I.e. invert the index within the word, but leave the word index (nunits) unchanged. r~
Re: [Patch, AArch64] Fix shuffle for big-endian.
Author: Alan Lawrence
Date: Mon, 24 Mar 2014 17:42
Date: Mon, 24 Mar 2014 17:42
109 lines
4021 bytes
4021 bytes
Further to that - all looks good after one-liner http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01142.html has gone in. (Without that, enabling the code in Tejas' patch causes a regression in gcc.dg/torture/vshuf-v4hi.c as loading a vector constant goes wrong). I'll send a patch to enable this and fix up the testsuite shortly... Cheers, Alan Alan Lawrence wrote: > I've been doing some local testing using this patch as a basis for some of my > own work on NEON intrinsics, and it seems good to me. A couple of points: > > (1) Re. the comment that "If two vectors, we end up with a wierd mixed-endian > mode on NEON": firstly "wierd" should be spelt "weird"; > secondly, if I understand right, this comment belongs with the next "if > (!d->one_vector_p...)" rather than the "if (BYTES_BIG_ENDIAN)" before which it's > written. > > (2) as you say, this code is not exercised, unless you do something to remove > the 'if (BYTES_BIG_ENDIAN) return false;' earlier in that same function. Can I > politely suggest you do that here in this patch? > > (3) In my own regression testing, with const_vec_perm enabled on big_endian, I > see 2*PASS->FAIL, namely > > gcc.dg/vect/vect-114.c scan-tree-dump-times vect "vectorized 0 loops" 1 > > gcc.dg/vect/vect-114.c -flto -ffat-lto-objects scan-tree-dump-times > vect "vectorized 0 loops" 1 > > These are essentially noise, but the noise is removed and I see no other > problems, if (after this patch) I re-enable the testsuite's "vect_perm" target > selector for aarch64 big-endian (testsuite/lib/target-supports.exp). Would you > like a separate patch for that or roll it in here? > > Cheers, Alan > > Tejas Belagod wrote: >> Hi, >> >> When a shuffle of more than one input happens, on NEON we end up with a >> 'mixed-endian' format in the register list which TBL operates on. We don't make >> this correction in RTL and therefore the shuffle operation gets it incorrect. >> Here is a patch that fixes-up the index table in the selector rtx in RTL to also >> be mixed-endian to reflect what's happening on NEON. >> >> As trunk stands, this patch will not be exercised as constant vector permute for >> Big-endian is disabled. I've tested this by locally enabling const vec_perm and >> it fixes the some regressions we have on big-endian: >> >> aarch64_be-none-elf: >> FAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -fomit-frame-pointer >> FAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -fomit-frame-pointer >> -funroll-all-loops -finline-functions >> FAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -fomit-frame-pointer >> -funroll-loops >> FAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -g >> FAIL->PASS: gcc.dg/torture/vector-shuffle1.c -O0 execution test >> FAIL->PASS: gcc.dg/torture/vshuf-v16qi.c -O2 execution test >> FAIL->PASS: gcc.dg/torture/vshuf-v2df.c -O2 execution test >> FAIL->PASS: gcc.dg/torture/vshuf-v2di.c -O2 execution test >> FAIL->PASS: gcc.dg/torture/vshuf-v2sf.c -O2 execution test >> FAIL->PASS: gcc.dg/torture/vshuf-v2si.c -O2 execution test >> FAIL->PASS: gcc.dg/torture/vshuf-v4sf.c -O2 execution test >> FAIL->PASS: gcc.dg/torture/vshuf-v4si.c -O2 execution test >> FAIL->PASS: gcc.dg/torture/vshuf-v8hi.c -O2 execution test >> FAIL->PASS: gcc.dg/torture/vshuf-v8qi.c -O2 execution test >> FAIL->PASS: gcc.dg/vect/vect-114.c -flto -ffat-lto-objects execution test >> FAIL->PASS: gcc.dg/vect/vect-114.c execution test >> FAIL->PASS: gcc.dg/vect/vect-15.c -flto -ffat-lto-objects execution test >> FAIL->PASS: gcc.dg/vect/vect-15.c execution test >> >> Also regressed on aarch64-none-elf. >> >> OK for stage-1? >> >> Thanks, >> Tejas. >> >> 2014-02-21 Tejas Belagod <tejas.belagod@arm.com> >> >> gcc/ >> * config/aarch64/aarch64.c (aarch64_evpc_tbl): Fix index vector for >> big-endian when dealing with more than one input shuffle vector. >> > >
Re: [Patch, AArch64] Fix shuffle for big-endian.
Author: Tejas Belagod
Date: Tue, 25 Mar 2014 17:11
Date: Tue, 25 Mar 2014 17:11
116 lines
4151 bytes
4151 bytes
Alan Lawrence wrote: > Further to that - all looks good after one-liner > http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01142.html has gone in. (Without > that, enabling the code in Tejas' patch causes a regression in > gcc.dg/torture/vshuf-v4hi.c as loading a vector constant goes wrong). > Thanks for your fix. Tejas. > I'll send a patch to enable this and fix up the testsuite shortly... > > Cheers, Alan > > Alan Lawrence wrote: >> I've been doing some local testing using this patch as a basis for some of my >> own work on NEON intrinsics, and it seems good to me. A couple of points: >> >> (1) Re. the comment that "If two vectors, we end up with a wierd mixed-endian >> mode on NEON": firstly "wierd" should be spelt "weird"; >> secondly, if I understand right, this comment belongs with the next "if >> (!d->one_vector_p...)" rather than the "if (BYTES_BIG_ENDIAN)" before which it's >> written. >> >> (2) as you say, this code is not exercised, unless you do something to remove >> the 'if (BYTES_BIG_ENDIAN) return false;' earlier in that same function. Can I >> politely suggest you do that here in this patch? >> >> (3) In my own regression testing, with const_vec_perm enabled on big_endian, I >> see 2*PASS->FAIL, namely >> >> gcc.dg/vect/vect-114.c scan-tree-dump-times vect "vectorized 0 loops" 1 >> >> gcc.dg/vect/vect-114.c -flto -ffat-lto-objects scan-tree-dump-times >> vect "vectorized 0 loops" 1 >> >> These are essentially noise, but the noise is removed and I see no other >> problems, if (after this patch) I re-enable the testsuite's "vect_perm" target >> selector for aarch64 big-endian (testsuite/lib/target-supports.exp). Would you >> like a separate patch for that or roll it in here? >> >> Cheers, Alan >> >> Tejas Belagod wrote: >>> Hi, >>> >>> When a shuffle of more than one input happens, on NEON we end up with a >>> 'mixed-endian' format in the register list which TBL operates on. We don't make >>> this correction in RTL and therefore the shuffle operation gets it incorrect. >>> Here is a patch that fixes-up the index table in the selector rtx in RTL to also >>> be mixed-endian to reflect what's happening on NEON. >>> >>> As trunk stands, this patch will not be exercised as constant vector permute for >>> Big-endian is disabled. I've tested this by locally enabling const vec_perm and >>> it fixes the some regressions we have on big-endian: >>> >>> aarch64_be-none-elf: >>> FAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -fomit-frame-pointer >>> FAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -fomit-frame-pointer >>> -funroll-all-loops -finline-functions >>> FAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -fomit-frame-pointer >>> -funroll-loops >>> FAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -g >>> FAIL->PASS: gcc.dg/torture/vector-shuffle1.c -O0 execution test >>> FAIL->PASS: gcc.dg/torture/vshuf-v16qi.c -O2 execution test >>> FAIL->PASS: gcc.dg/torture/vshuf-v2df.c -O2 execution test >>> FAIL->PASS: gcc.dg/torture/vshuf-v2di.c -O2 execution test >>> FAIL->PASS: gcc.dg/torture/vshuf-v2sf.c -O2 execution test >>> FAIL->PASS: gcc.dg/torture/vshuf-v2si.c -O2 execution test >>> FAIL->PASS: gcc.dg/torture/vshuf-v4sf.c -O2 execution test >>> FAIL->PASS: gcc.dg/torture/vshuf-v4si.c -O2 execution test >>> FAIL->PASS: gcc.dg/torture/vshuf-v8hi.c -O2 execution test >>> FAIL->PASS: gcc.dg/torture/vshuf-v8qi.c -O2 execution test >>> FAIL->PASS: gcc.dg/vect/vect-114.c -flto -ffat-lto-objects execution test >>> FAIL->PASS: gcc.dg/vect/vect-114.c execution test >>> FAIL->PASS: gcc.dg/vect/vect-15.c -flto -ffat-lto-objects execution test >>> FAIL->PASS: gcc.dg/vect/vect-15.c execution test >>> >>> Also regressed on aarch64-none-elf. >>> >>> OK for stage-1? >>> >>> Thanks, >>> Tejas. >>> >>> 2014-02-21 Tejas Belagod <tejas.belagod@arm.com> >>> >>> gcc/ >>> * config/aarch64/aarch64.c (aarch64_evpc_tbl): Fix index vector for >>> big-endian when dealing with more than one input shuffle vector. >>> >> >
Re: [Patch, AArch64] Fix shuffle for big-endian.
Author: Tejas Belagod
Date: Tue, 25 Mar 2014 17:12
Date: Tue, 25 Mar 2014 17:12
31 lines
791 bytes
791 bytes
Richard Henderson wrote: > On 02/21/2014 08:30 AM, Tejas Belagod wrote: >> + /* If two vectors, we end up with a wierd mixed-endian mode on NEON. */ >> + if (BYTES_BIG_ENDIAN) >> + { >> + if (!d->one_vector_p && d->perm[i] & nunits) >> + { >> + /* Extract the offset. */ >> + elt = d->perm[i] & (nunits - 1); >> + /* Reverse the top half. */ >> + elt = nunits - 1 - elt; >> + /* Offset it by the bottom half. */ >> + elt += nunits; >> + } >> + else >> + elt = nunits - 1 - d->perm[i]; >> + } > > Isn't this just > > elt = d->perm[i] ^ (nunits - 1); > > all the time? I.e. invert the index within the word, > but leave the word index (nunits) unchanged. > Yes, I think that works. Thanks! Tejas.
Thread Navigation
This is a paginated view of messages in the thread with full content displayed inline.
Messages are displayed in chronological order, with the original post highlighted in green.
Use pagination controls to navigate through all messages in large threads.
Back to All Threads