🚀 go-pugleaf

RetroBBS NetNews Server

Inspired by RockSolid Light RIP Retro Guy

Thread View: gmane.comp.gcc.patches
6 messages
6 total messages Started by Tejas Belagod Fri, 21 Feb 2014 16:30
[Patch, AArch64] Fix shuffle for big-endian.
#306265
Author: Tejas Belagod
Date: Fri, 21 Feb 2014 16:30
104 lines
3563 bytes
This is a multi-part message in MIME format.
--------------020100030405060301050609
Content-Type: text/plain; charset=WINDOWS-1252; format=flowed
Content-Transfer-Encoding: quoted-printable


Hi,

When a shuffle of more than one input happens, on NEON we end up with a 
'mixed-endian' format in the register list which TBL operates on. We don't make 
this correction in RTL and therefore the shuffle operation gets it incorrect. 
Here is a patch that fixes-up the index table in the selector rtx in RTL to also 
be mixed-endian to reflect what's happening on NEON.

As trunk stands, this patch will not be exercised as constant vector permute for 
Big-endian is disabled. I've tested this by locally enabling const vec_perm and 
it fixes the some regressions we have on big-endian:

aarch64_be-none-elf:
FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -fomit-frame-pointer
FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -fomit-frame-pointer 
-funroll-all-loops -finline-functions
FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -fomit-frame-pointer 
-funroll-loops
FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -g
FAIL->PASS: gcc.dg/torture/vector-shuffle1.c  -O0  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v16qi.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v2df.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v2di.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v2sf.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v2si.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v4sf.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v4si.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v8hi.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v8qi.c  -O2  execution test
FAIL->PASS: gcc.dg/vect/vect-114.c -flto -ffat-lto-objects execution test
FAIL->PASS: gcc.dg/vect/vect-114.c execution test
FAIL->PASS: gcc.dg/vect/vect-15.c -flto -ffat-lto-objects execution test
FAIL->PASS: gcc.dg/vect/vect-15.c execution test

Also regressed on aarch64-none-elf.

OK for stage-1?

Thanks,
Tejas.

2014-02-21  Tejas Belagod  <tejas.belagod@arm.com>

gcc/
	* config/aarch64/aarch64.c (aarch64_evpc_tbl): Fix index vector for
	big-endian when dealing with more than one input shuffle vector.
--------------020100030405060301050609
Content-Type: text/plain; name=tbl-1.txt
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline;
 filename="tbl-1.txt"

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ea90311..fd473a3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8128,7 +8128,28 @@ aarch64_evpc_tbl (struct expand_vec_perm_d *d)
     return false;

   for (i = 0; i < nelt; ++i)
-    rperm[i] = GEN_INT (d->perm[i]);
+    {
+      int nunits = GET_MODE_NUNITS (vmode);
+      int elt = d->perm[i];
+
+      /* If two vectors, we end up with a wierd mixed-endian mode on NEON.  */
+      if (BYTES_BIG_ENDIAN)
+	{
+	  if (!d->one_vector_p && d->perm[i] & nunits)
+	    {
+	      /* Extract the offset.  */
+	      elt = d->perm[i] & (nunits - 1);
+	      /* Reverse the top half.  */
+	      elt = nunits - 1 - elt;
+	      /* Offset it by the bottom half.  */
+	      elt += nunits;
+	    }
+	  else
+	    elt = nunits - 1 - d->perm[i];
+	}
+
+      rperm[i] = GEN_INT (elt);
+    }
   sel = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, rperm));
   sel = force_reg (vmode, sel);

--------------020100030405060301050609--

Re: [Patch, AArch64] Fix shuffle for big-endian.
#307210
Author: Alan Lawrence
Date: Wed, 12 Mar 2014 14:52
99 lines
3620 bytes
I've been doing some local testing using this patch as a basis for some of my
own work on NEON intrinsics, and it seems good to me. A couple of points:

(1) Re. the comment that "If two vectors, we end up with a wierd mixed-endian
mode on NEON": firstly "wierd" should be spelt "weird";
secondly, if I understand right, this comment belongs with the next "if
(!d->one_vector_p...)" rather than the "if (BYTES_BIG_ENDIAN)" before which it's
written.

(2) as you say, this code is not exercised, unless you do something to remove
the 'if (BYTES_BIG_ENDIAN) return false;' earlier in that same function. Can I
politely suggest you do that here in this patch?

(3) In my own regression testing, with const_vec_perm enabled on big_endian, I
see 2*PASS->FAIL, namely

gcc.dg/vect/vect-114.c scan-tree-dump-times vect "vectorized 0 loops" 1

gcc.dg/vect/vect-114.c -flto -ffat-lto-objects  scan-tree-dump-times
vect "vectorized 0 loops" 1

These are essentially noise, but the noise is removed and I see no other
problems, if (after this patch) I re-enable the testsuite's "vect_perm" target
selector for aarch64 big-endian (testsuite/lib/target-supports.exp). Would you
like a separate patch for that or roll it in here?

Cheers, Alan

Tejas Belagod wrote:
 > > Hi,
 > >
 > > When a shuffle of more than one input happens, on NEON we end up with a
 > > 'mixed-endian' format in the register list which TBL operates on. We don't 
make
 > > this correction in RTL and therefore the shuffle operation gets it incorrect.
 > > Here is a patch that fixes-up the index table in the selector rtx in RTL to 
also
 > > be mixed-endian to reflect what's happening on NEON.
 > >
 > > As trunk stands, this patch will not be exercised as constant vector 
permute for
 > > Big-endian is disabled. I've tested this by locally enabling const vec_perm 
and
 > > it fixes the some regressions we have on big-endian:
 > >
 > > aarch64_be-none-elf:
 > > FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 
-fomit-frame-pointer
 > > FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 
-fomit-frame-pointer
 > > -funroll-all-loops -finline-functions
 > > FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 
-fomit-frame-pointer
 > > -funroll-loops
 > > FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -g
 > > FAIL->PASS: gcc.dg/torture/vector-shuffle1.c  -O0  execution test
 > > FAIL->PASS: gcc.dg/torture/vshuf-v16qi.c  -O2  execution test
 > > FAIL->PASS: gcc.dg/torture/vshuf-v2df.c  -O2  execution test
 > > FAIL->PASS: gcc.dg/torture/vshuf-v2di.c  -O2  execution test
 > > FAIL->PASS: gcc.dg/torture/vshuf-v2sf.c  -O2  execution test
 > > FAIL->PASS: gcc.dg/torture/vshuf-v2si.c  -O2  execution test
 > > FAIL->PASS: gcc.dg/torture/vshuf-v4sf.c  -O2  execution test
 > > FAIL->PASS: gcc.dg/torture/vshuf-v4si.c  -O2  execution test
 > > FAIL->PASS: gcc.dg/torture/vshuf-v8hi.c  -O2  execution test
 > > FAIL->PASS: gcc.dg/torture/vshuf-v8qi.c  -O2  execution test
 > > FAIL->PASS: gcc.dg/vect/vect-114.c -flto -ffat-lto-objects execution test
 > > FAIL->PASS: gcc.dg/vect/vect-114.c execution test
 > > FAIL->PASS: gcc.dg/vect/vect-15.c -flto -ffat-lto-objects execution test
 > > FAIL->PASS: gcc.dg/vect/vect-15.c execution test
 > >
 > > Also regressed on aarch64-none-elf.
 > >
 > > OK for stage-1?
 > >
 > > Thanks,
 > > Tejas.
 > >
 > > 2014-02-21  Tejas Belagod  <tejas.belagod@arm.com>
 > >
 > > gcc/
 > > 	* config/aarch64/aarch64.c (aarch64_evpc_tbl): Fix index vector for
 > > 	big-endian when dealing with more than one input shuffle vector.
 > >

Re: [Patch, AArch64] Fix shuffle for big-endian.
#307912
Author: Richard Henderso
Date: Mon, 24 Mar 2014 11:25
26 lines
674 bytes
On 02/21/2014 08:30 AM, Tejas Belagod wrote:
> +      /* If two vectors, we end up with a wierd mixed-endian mode on NEON.  */
> +      if (BYTES_BIG_ENDIAN)
> +	{
> +	  if (!d->one_vector_p && d->perm[i] & nunits)
> +	    {
> +	      /* Extract the offset.  */
> +	      elt = d->perm[i] & (nunits - 1);
> +	      /* Reverse the top half.  */
> +	      elt = nunits - 1 - elt;
> +	      /* Offset it by the bottom half.  */
> +	      elt += nunits;
> +	    }
> +	  else
> +	    elt = nunits - 1 - d->perm[i];
> +	}

Isn't this just

  elt = d->perm[i] ^ (nunits - 1);

all the time?  I.e. invert the index within the word,
but leave the word index (nunits) unchanged.


r~
Re: [Patch, AArch64] Fix shuffle for big-endian.
#307908
Author: Alan Lawrence
Date: Mon, 24 Mar 2014 17:42
109 lines
4021 bytes
Further to that - all looks good after one-liner 
http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01142.html has gone in. (Without 
that, enabling the code in Tejas' patch causes a regression in 
gcc.dg/torture/vshuf-v4hi.c as loading a vector constant goes wrong).

I'll send a patch to enable this and fix up the testsuite shortly...

Cheers, Alan

Alan Lawrence wrote:
> I've been doing some local testing using this patch as a basis for some of my 
> own work on NEON intrinsics, and it seems good to me. A couple of points:
> 
> (1) Re. the comment that "If two vectors, we end up with a wierd mixed-endian 
> mode on NEON": firstly "wierd" should be spelt "weird";
> secondly, if I understand right, this comment belongs with the next "if
> (!d->one_vector_p...)" rather than the "if (BYTES_BIG_ENDIAN)" before which it's
> written.
> 
> (2) as you say, this code is not exercised, unless you do something to remove 
> the 'if (BYTES_BIG_ENDIAN) return false;' earlier in that same function. Can I 
> politely suggest you do that here in this patch?
> 
> (3) In my own regression testing, with const_vec_perm enabled on big_endian, I 
> see 2*PASS->FAIL, namely
> 
> gcc.dg/vect/vect-114.c scan-tree-dump-times vect "vectorized 0 loops" 1
> 
> gcc.dg/vect/vect-114.c -flto -ffat-lto-objects  scan-tree-dump-times
> vect "vectorized 0 loops" 1
> 
> These are essentially noise, but the noise is removed and I see no other 
> problems, if (after this patch) I re-enable the testsuite's "vect_perm" target 
> selector for aarch64 big-endian (testsuite/lib/target-supports.exp). Would you 
> like a separate patch for that or roll it in here?
> 
> Cheers, Alan
> 
> Tejas Belagod wrote:
>> Hi,
>>
>> When a shuffle of more than one input happens, on NEON we end up with a 
>> 'mixed-endian' format in the register list which TBL operates on. We don't make 
>> this correction in RTL and therefore the shuffle operation gets it incorrect. 
>> Here is a patch that fixes-up the index table in the selector rtx in RTL to also 
>> be mixed-endian to reflect what's happening on NEON.
>>
>> As trunk stands, this patch will not be exercised as constant vector permute for 
>> Big-endian is disabled. I've tested this by locally enabling const vec_perm and 
>> it fixes the some regressions we have on big-endian:
>>
>> aarch64_be-none-elf:
>> FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -fomit-frame-pointer
>> FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -fomit-frame-pointer 
>> -funroll-all-loops -finline-functions
>> FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -fomit-frame-pointer 
>> -funroll-loops
>> FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -g
>> FAIL->PASS: gcc.dg/torture/vector-shuffle1.c  -O0  execution test
>> FAIL->PASS: gcc.dg/torture/vshuf-v16qi.c  -O2  execution test
>> FAIL->PASS: gcc.dg/torture/vshuf-v2df.c  -O2  execution test
>> FAIL->PASS: gcc.dg/torture/vshuf-v2di.c  -O2  execution test
>> FAIL->PASS: gcc.dg/torture/vshuf-v2sf.c  -O2  execution test
>> FAIL->PASS: gcc.dg/torture/vshuf-v2si.c  -O2  execution test
>> FAIL->PASS: gcc.dg/torture/vshuf-v4sf.c  -O2  execution test
>> FAIL->PASS: gcc.dg/torture/vshuf-v4si.c  -O2  execution test
>> FAIL->PASS: gcc.dg/torture/vshuf-v8hi.c  -O2  execution test
>> FAIL->PASS: gcc.dg/torture/vshuf-v8qi.c  -O2  execution test
>> FAIL->PASS: gcc.dg/vect/vect-114.c -flto -ffat-lto-objects execution test
>> FAIL->PASS: gcc.dg/vect/vect-114.c execution test
>> FAIL->PASS: gcc.dg/vect/vect-15.c -flto -ffat-lto-objects execution test
>> FAIL->PASS: gcc.dg/vect/vect-15.c execution test
>>
>> Also regressed on aarch64-none-elf.
>>
>> OK for stage-1?
>>
>> Thanks,
>> Tejas.
>>
>> 2014-02-21  Tejas Belagod  <tejas.belagod@arm.com>
>>
>> gcc/
>> 	* config/aarch64/aarch64.c (aarch64_evpc_tbl): Fix index vector for
>> 	big-endian when dealing with more than one input shuffle vector.
>>
> 
> 


Re: [Patch, AArch64] Fix shuffle for big-endian.
#307988
Author: Tejas Belagod
Date: Tue, 25 Mar 2014 17:11
116 lines
4151 bytes
Alan Lawrence wrote:
> Further to that - all looks good after one-liner 
> http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01142.html has gone in. (Without 
> that, enabling the code in Tejas' patch causes a regression in 
> gcc.dg/torture/vshuf-v4hi.c as loading a vector constant goes wrong).
> 

Thanks for your fix.

Tejas.

> I'll send a patch to enable this and fix up the testsuite shortly...
> 
> Cheers, Alan
> 
> Alan Lawrence wrote:
>> I've been doing some local testing using this patch as a basis for some of my 
>> own work on NEON intrinsics, and it seems good to me. A couple of points:
>>
>> (1) Re. the comment that "If two vectors, we end up with a wierd mixed-endian 
>> mode on NEON": firstly "wierd" should be spelt "weird";
>> secondly, if I understand right, this comment belongs with the next "if
>> (!d->one_vector_p...)" rather than the "if (BYTES_BIG_ENDIAN)" before which it's
>> written.
>>
>> (2) as you say, this code is not exercised, unless you do something to remove 
>> the 'if (BYTES_BIG_ENDIAN) return false;' earlier in that same function. Can I 
>> politely suggest you do that here in this patch?
>>
>> (3) In my own regression testing, with const_vec_perm enabled on big_endian, I 
>> see 2*PASS->FAIL, namely
>>
>> gcc.dg/vect/vect-114.c scan-tree-dump-times vect "vectorized 0 loops" 1
>>
>> gcc.dg/vect/vect-114.c -flto -ffat-lto-objects  scan-tree-dump-times
>> vect "vectorized 0 loops" 1
>>
>> These are essentially noise, but the noise is removed and I see no other 
>> problems, if (after this patch) I re-enable the testsuite's "vect_perm" target 
>> selector for aarch64 big-endian (testsuite/lib/target-supports.exp). Would you 
>> like a separate patch for that or roll it in here?
>>
>> Cheers, Alan
>>
>> Tejas Belagod wrote:
>>> Hi,
>>>
>>> When a shuffle of more than one input happens, on NEON we end up with a 
>>> 'mixed-endian' format in the register list which TBL operates on. We don't make 
>>> this correction in RTL and therefore the shuffle operation gets it incorrect. 
>>> Here is a patch that fixes-up the index table in the selector rtx in RTL to also 
>>> be mixed-endian to reflect what's happening on NEON.
>>>
>>> As trunk stands, this patch will not be exercised as constant vector permute for 
>>> Big-endian is disabled. I've tested this by locally enabling const vec_perm and 
>>> it fixes the some regressions we have on big-endian:
>>>
>>> aarch64_be-none-elf:
>>> FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -fomit-frame-pointer
>>> FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -fomit-frame-pointer 
>>> -funroll-all-loops -finline-functions
>>> FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -fomit-frame-pointer 
>>> -funroll-loops
>>> FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -g
>>> FAIL->PASS: gcc.dg/torture/vector-shuffle1.c  -O0  execution test
>>> FAIL->PASS: gcc.dg/torture/vshuf-v16qi.c  -O2  execution test
>>> FAIL->PASS: gcc.dg/torture/vshuf-v2df.c  -O2  execution test
>>> FAIL->PASS: gcc.dg/torture/vshuf-v2di.c  -O2  execution test
>>> FAIL->PASS: gcc.dg/torture/vshuf-v2sf.c  -O2  execution test
>>> FAIL->PASS: gcc.dg/torture/vshuf-v2si.c  -O2  execution test
>>> FAIL->PASS: gcc.dg/torture/vshuf-v4sf.c  -O2  execution test
>>> FAIL->PASS: gcc.dg/torture/vshuf-v4si.c  -O2  execution test
>>> FAIL->PASS: gcc.dg/torture/vshuf-v8hi.c  -O2  execution test
>>> FAIL->PASS: gcc.dg/torture/vshuf-v8qi.c  -O2  execution test
>>> FAIL->PASS: gcc.dg/vect/vect-114.c -flto -ffat-lto-objects execution test
>>> FAIL->PASS: gcc.dg/vect/vect-114.c execution test
>>> FAIL->PASS: gcc.dg/vect/vect-15.c -flto -ffat-lto-objects execution test
>>> FAIL->PASS: gcc.dg/vect/vect-15.c execution test
>>>
>>> Also regressed on aarch64-none-elf.
>>>
>>> OK for stage-1?
>>>
>>> Thanks,
>>> Tejas.
>>>
>>> 2014-02-21  Tejas Belagod  <tejas.belagod@arm.com>
>>>
>>> gcc/
>>> 	* config/aarch64/aarch64.c (aarch64_evpc_tbl): Fix index vector for
>>> 	big-endian when dealing with more than one input shuffle vector.
>>>
>>
> 


Re: [Patch, AArch64] Fix shuffle for big-endian.
#307989
Author: Tejas Belagod
Date: Tue, 25 Mar 2014 17:12
31 lines
791 bytes
Richard Henderson wrote:
> On 02/21/2014 08:30 AM, Tejas Belagod wrote:
>> +      /* If two vectors, we end up with a wierd mixed-endian mode on NEON.  */
>> +      if (BYTES_BIG_ENDIAN)
>> +	{
>> +	  if (!d->one_vector_p && d->perm[i] & nunits)
>> +	    {
>> +	      /* Extract the offset.  */
>> +	      elt = d->perm[i] & (nunits - 1);
>> +	      /* Reverse the top half.  */
>> +	      elt = nunits - 1 - elt;
>> +	      /* Offset it by the bottom half.  */
>> +	      elt += nunits;
>> +	    }
>> +	  else
>> +	    elt = nunits - 1 - d->perm[i];
>> +	}
> 
> Isn't this just
> 
>   elt = d->perm[i] ^ (nunits - 1);
> 
> all the time?  I.e. invert the index within the word,
> but leave the word index (nunits) unchanged.
> 

Yes, I think that works. Thanks!

Tejas.

Thread Navigation

This is a paginated view of messages in the thread with full content displayed inline.

Messages are displayed in chronological order, with the original post highlighted in green.

Use pagination controls to navigate through all messages in large threads.

Back to All Threads