Power Developer
https://powerdeveloper.org/forums/

Eigen port to ARM NEON!
https://powerdeveloper.org/forums/viewtopic.php?f=60&t=1776
Page 1 of 2

Author:  markos [ Wed Mar 03, 2010 1:30 pm ]
Post subject:  Eigen port to ARM NEON!

http://bitbucket.org/eigen/eigen/change ... af7abc0af/

Here are some results from a matrix addition/multiplication benchmark (sizes 512x512) on the Efika MX:

Scalar:
$ ./bench_gemm.gcc4.4.1cs
eigen cpu 3.84s 0.0699051 GFLOPS (19.27s)
eigen real 3.8469s 0.0697796 GFLOPS (19.2648s)


NEON:
$ ./bench_gemm.gcc4.4.1cs.neon
eigen cpu 0.81s 0.331402 GFLOPS (4.07s)
eigen real 0.813919s 0.329806 GFLOPS (4.07218s)

~4.6x faster...


No comments, apart from one: if NEON is that good -and I think it is-, I don't think I'll miss AltiVec and PowerPC.

UPDATE: Results have been fixed, apparently the scalar results were without -mfpu=vfp option -which is needed to actually use the FPU on ARM. ~4.5x faster is more logical, but still very very nice :) Sorry for the misunderstanding

Author:  PurpleAlien [ Wed Mar 03, 2010 1:52 pm ]
Post subject: 

Nice :-)

Great work!


Johan.

Author:  Jerzy Guc (Drako) [ Wed Mar 03, 2010 2:00 pm ]
Post subject: 

Wow did not expect this result
So NEON is not so bad :)

Author:  takemehomegrandma [ Wed Mar 03, 2010 3:07 pm ]
Post subject:  Re: Eigen2 port to ARM NEON!

It would be really interesting to see a broad spectrum benchmark comparison between a 7447 G4 PPC CPU (as used in Pegasos 2) running @ 800MHz (or results recalculated from 1GHz to 800MHz accordingly) and an i.MX515 CPU.

I have no sense whatsoever about the ARM chip's performance (I have never seen or experienced one in action), but maybe it won't be too bad off? I think it would be interesting for more people than me here on Powerdeveloper.org to see a comparison with the Pegasos 2 G4 hardware, of which most of us has experiences from and can relate to!

Not that raw performance is the key goal of the chip, rather power efficiency, but anyway...

Author:  markos [ Wed Mar 03, 2010 4:21 pm ]
Post subject:  Re: Eigen2 port to ARM NEON!

Quote:
It would be really interesting to see a broad spectrum benchmark comparison between a 7447 G4 PPC CPU (as used in Pegasos 2) running @ 800MHz (or results recalculated from 1GHz to 800MHz accordingly) and an i.MX515 CPU.

I have no sense whatsoever about the ARM chip's performance (I have never seen or experienced one in action), but maybe it won't be too bad off? I think it would be interesting for more people than me here on Powerdeveloper.org to see a comparison with the Pegasos 2 G4 hardware, of which most of us has experiences from and can relate to!

Not that raw performance is the key goal of the chip, rather power efficiency, but anyway...
I will provide tomorrow with Eigen results from G4 also for comparison. One thing is for certain though: NEON has some real good tricks up its sleeve that are not available in either SSE or AltiVec. Even for that it wins both, IMHO.

Author:  blu [ Wed Mar 03, 2010 8:55 pm ]
Post subject: 

impressive results, Markos. what are you impressions from this simd isa so far?

ps: don't you miss the permute? ; )

Author:  markos [ Thu Mar 04, 2010 3:44 am ]
Post subject: 

Quote:
impressive results, Markos. what are you impressions from this simd isa so far?

ps: don't you miss the permute? ; )
The ISA is a very complete and orthogonal SIMD approach. It can do many more things than AltiVec or SSE (I especially like the fact that I can split a 128-bit vector into 2 64-bit vectors, perform an operation and then combine them back into 128-bit. It can load/store 4x128-bit vectors at once also

PS. It has vtbl and vtbx, which perform the same thing, I haven't played around with it yet though :)

Author:  corto [ Thu Mar 04, 2010 2:21 pm ]
Post subject:  Re: Eigen port to ARM NEON!

Quote:
~4.6x faster...[/b]

No comments, apart from one: if NEON is that good -and I think it is-, I don't think I'll miss AltiVec and PowerPC.
Results are impressive but the comment makes me sad ...

I didn't expect NEON was so good ...

Author:  markos [ Thu Mar 04, 2010 4:31 pm ]
Post subject:  Re: Eigen port to ARM NEON!

Image
Quote:
Quote:
~4.6x faster...[/b]

No comments, apart from one: if NEON is that good -and I think it is-, I don't think I'll miss AltiVec and PowerPC.
Results are impressive but the comment makes me sad ...

I didn't expect NEON was so good ...
It's better: new benchmarks after some finetuning:

$ ./bench_gemm.gcc4.4.1cs.neon
eigen cpu 2.44s 0.880116 GFLOPS (12.29s)
eigen real 2.44403s 0.878666 GFLOPS (12.2967s)

(compiled with gcc 4.4.1 CodeSourcery)

$ ./bench_gemm.gcc4.5.neon
eigen cpu 2.36s 0.909951 GFLOPS (11.85s)
eigen real 2.36316s 0.908733 GFLOPS (11.8516s)

(compiled with gcc 4.5 experimental)

~12.9x times faster. Yes this time it's real. According to the Eigen developers, we have a theoritical limit of 1.6GFLOPS in the EfikaMX, so we have a bit of a work to do yet :)

Author:  jcmarcos [ Fri Mar 05, 2010 6:30 am ]
Post subject:  Re: Eigen port to ARM NEON!

Quote:
Quote:
~4.6x faster...[/b]

No comments, apart from one: if NEON is that good -and I think it is-, I don't think I'll miss AltiVec and PowerPC.
I didn't expect NEON was so good ...
Well, that's progress. You can't be the best always. Modern machines have to be better than older ones. By the way, great to see these progresses Konstantinos!

Author:  markos [ Fri Mar 05, 2010 7:53 am ]
Post subject:  Re: Eigen port to ARM NEON!

Quote:
Quote:
Quote:
~4.6x faster...[/b]

No comments, apart from one: if NEON is that good -and I think it is-, I don't think I'll miss AltiVec and PowerPC.
I didn't expect NEON was so good ...
Well, that's progress. You can't be the best always. Modern machines have to be better than older ones. By the way, great to see these progresses Konstantinos!
Altivec is still king though, check these results on the G4:

Scalar:
$ ./bench_gemm
eigen cpu 2.65264s 0.809565 GFLOPS (13.283s)
eigen real 2.6532s 0.809394 GFLOPS (13.2863s)

Altivec:
$ ./bench_gemm
eigen cpu 1.17936s 1.82088 GFLOPS (5.90097s)
eigen real 1.17959s 1.82054 GFLOPS (5.90304s)

But have in mind that PowerPC support is much better and more mature than for ARM (esp. wrt NEON) and that PowerPC is slightly faster at 1Ghz. Theoritically the G4 can do 4GFLOPS at fp math and the iMX515 can do 1.6GFLOPS.

Author:  jcmarcos [ Fri Mar 05, 2010 10:00 am ]
Post subject:  Re: Eigen port to ARM NEON!

Quote:
PowerPC is slightly faster at 1Ghz
Yes, but ARM is smarter, because it always sucks less electrons. Or am I wrong?

Have you seen that initiative, to build new high power ARM CPUs that are NOT targetted at mobile computers? What will happen when they free ("take off the handcuffs") these processor from the power restrictions they've always had?

Author:  markos [ Fri Mar 05, 2010 10:29 am ]
Post subject:  Re: Eigen port to ARM NEON!

Quote:
Yes, but ARM is smarter, because it always sucks less electrons. Or am I wrong?

Have you seen that initiative, to build new high power ARM CPUs that are NOT targetted at mobile computers? What will happen when they free ("take off the handcuffs") these processor from the power restrictions they've always had?
I have remote access to a prototype quad-core ARM Cortex A9 :-P

Author:  corto [ Sun Mar 07, 2010 3:46 pm ]
Post subject:  Re: Eigen port to ARM NEON!

Quote:
Quote:
PowerPC is slightly faster at 1Ghz
Yes, but ARM is smarter, because it always sucks less electrons. Or am I wrong?

Have you seen that initiative, to build new high power ARM CPUs that are NOT targetted at mobile computers? What will happen when they free ("take off the handcuffs") these processor from the power restrictions they've always had?
Interesting link, thanks. Markos, you are lucky because not so many people can see / use a Cortex-A9 these days even it was announced years ago (at least 2 years).

jcmarcos : I liked very much ARM because it was small, efficient, easy to play with ... But with years, they add many things that were not planned and it is sometimes ugly in my opinion. I am afraid to see it takes the same way x86 did. But some features are great and it works well.

I work on ARM every day and I sometimes play with low level things.

Author:  slyd [ Tue Mar 09, 2010 11:35 am ]
Post subject: 

> the iMX515 can do 1.6GFLOPS

The NEON Pipeline has 4 Single Precision FP Multiply Units and 4 Accumulators ... it could handle 4 Floats/Cycle.

So shouldn't this be 3.2 GFLOPS or am I missing something?

Page 1 of 2 All times are UTC-06:00
Powered by phpBB® Forum Software © phpBB Group
http://www.phpbb.com/