All times are UTC - 6 hours




Post new topic Reply to topic  [ 22 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Eigen port to ARM NEON!
PostPosted: Wed Mar 03, 2010 1:30 pm 
Offline


Wed Oct 13, 2004 7:26 am

347
http://bitbucket.org/eigen/eigen/change ... af7abc0af/

Here are some results from a matrix addition/multiplication benchmark (sizes 512x512) on the Efika MX:

Scalar:
$ ./bench_gemm.gcc4.4.1cs
eigen cpu 3.84s 0.0699051 GFLOPS (19.27s)
eigen real 3.8469s 0.0697796 GFLOPS (19.2648s)


NEON:
$ ./bench_gemm.gcc4.4.1cs.neon
eigen cpu 0.81s 0.331402 GFLOPS (4.07s)
eigen real 0.813919s 0.329806 GFLOPS (4.07218s)

~4.6x faster...


No comments, apart from one: if NEON is that good -and I think it is-, I don't think I'll miss AltiVec and PowerPC.

UPDATE: Results have been fixed, apparently the scalar results were without -mfpu=vfp option -which is needed to actually use the FPU on ARM. ~4.5x faster is more logical, but still very very nice :) Sorry for the misunderstanding


Last edited by markos on Wed Mar 03, 2010 5:49 pm, edited 2 times in total.

Top
 Profile  
 
 Post subject:
PostPosted: Wed Mar 03, 2010 1:52 pm 
Offline
Genesi


Mon Jan 30, 2006 2:28 am

409

Finland
Nice :-)

Great work!


Johan.
Johan Dams, Genesi USA Inc.
Director, Software Engineering

Yep, I have a blog... PurpleAlienPlanet


Top
 Profile  
 
 Post subject:
PostPosted: Wed Mar 03, 2010 2:00 pm 
Offline


Mon Mar 10, 2008 11:00 am

56

Poland/Chelm
Wow did not expect this result
So NEON is not so bad :)
Past: Pegasos II G4 & Efika
Now : Mac Mini G4 1.5 Ghz & MorphOS 3.1
BlaBla Team Member -> http://blabla.ppa.pl


Top
 Profile  
 
PostPosted: Wed Mar 03, 2010 3:07 pm 
Offline


Wed Jul 27, 2005 9:20 am

242
It would be really interesting to see a broad spectrum benchmark comparison between a 7447 G4 PPC CPU (as used in Pegasos 2) running @ 800MHz (or results recalculated from 1GHz to 800MHz accordingly) and an i.MX515 CPU.

I have no sense whatsoever about the ARM chip's performance (I have never seen or experienced one in action), but maybe it won't be too bad off? I think it would be interesting for more people than me here on Powerdeveloper.org to see a comparison with the Pegasos 2 G4 hardware, of which most of us has experiences from and can relate to!

Not that raw performance is the key goal of the chip, rather power efficiency, but anyway...


Top
 Profile  
 
PostPosted: Wed Mar 03, 2010 4:21 pm 
Offline


Wed Oct 13, 2004 7:26 am

347
takemehomegrandma wrote:
It would be really interesting to see a broad spectrum benchmark comparison between a 7447 G4 PPC CPU (as used in Pegasos 2) running @ 800MHz (or results recalculated from 1GHz to 800MHz accordingly) and an i.MX515 CPU.

I have no sense whatsoever about the ARM chip's performance (I have never seen or experienced one in action), but maybe it won't be too bad off? I think it would be interesting for more people than me here on Powerdeveloper.org to see a comparison with the Pegasos 2 G4 hardware, of which most of us has experiences from and can relate to!

Not that raw performance is the key goal of the chip, rather power efficiency, but anyway...


I will provide tomorrow with Eigen results from G4 also for comparison. One thing is for certain though: NEON has some real good tricks up its sleeve that are not available in either SSE or AltiVec. Even for that it wins both, IMHO.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Mar 03, 2010 8:55 pm 
Offline


Tue Mar 31, 2009 10:24 pm

171
impressive results, Markos. what are you impressions from this simd isa so far?

ps: don't you miss the permute? ; )


Top
 Profile  
 
 Post subject:
PostPosted: Thu Mar 04, 2010 3:44 am 
Offline


Wed Oct 13, 2004 7:26 am

347
blu wrote:
impressive results, Markos. what are you impressions from this simd isa so far?

ps: don't you miss the permute? ; )


The ISA is a very complete and orthogonal SIMD approach. It can do many more things than AltiVec or SSE (I especially like the fact that I can split a 128-bit vector into 2 64-bit vectors, perform an operation and then combine them back into 128-bit. It can load/store 4x128-bit vectors at once also

PS. It has vtbl and vtbx, which perform the same thing, I haven't played around with it yet though :)


Top
 Profile  
 
PostPosted: Thu Mar 04, 2010 2:21 pm 
Offline


Sat Oct 27, 2007 12:18 pm

26

Grenoble, France
markos wrote:
~4.6x faster...[/b]

No comments, apart from one: if NEON is that good -and I think it is-, I don't think I'll miss AltiVec and PowerPC.


Results are impressive but the comment makes me sad ...

I didn't expect NEON was so good ...


Top
 Profile  
 
PostPosted: Thu Mar 04, 2010 4:31 pm 
Offline


Wed Oct 13, 2004 7:26 am

347
Image

corto wrote:
markos wrote:
~4.6x faster...[/b]

No comments, apart from one: if NEON is that good -and I think it is-, I don't think I'll miss AltiVec and PowerPC.


Results are impressive but the comment makes me sad ...

I didn't expect NEON was so good ...


It's better: new benchmarks after some finetuning:

$ ./bench_gemm.gcc4.4.1cs.neon
eigen cpu 2.44s 0.880116 GFLOPS (12.29s)
eigen real 2.44403s 0.878666 GFLOPS (12.2967s)

(compiled with gcc 4.4.1 CodeSourcery)

$ ./bench_gemm.gcc4.5.neon
eigen cpu 2.36s 0.909951 GFLOPS (11.85s)
eigen real 2.36316s 0.908733 GFLOPS (11.8516s)

(compiled with gcc 4.5 experimental)

~12.9x times faster. Yes this time it's real. According to the Eigen developers, we have a theoritical limit of 1.6GFLOPS in the EfikaMX, so we have a bit of a work to do yet :)


Top
 Profile  
 
PostPosted: Fri Mar 05, 2010 6:30 am 
Offline


Mon Jan 08, 2007 3:40 am

195

Pinto, Madrid, Spain
corto wrote:
markos wrote:
~4.6x faster...[/b]

No comments, apart from one: if NEON is that good -and I think it is-, I don't think I'll miss AltiVec and PowerPC.


I didn't expect NEON was so good ...


Well, that's progress. You can't be the best always. Modern machines have to be better than older ones. By the way, great to see these progresses Konstantinos!


Top
 Profile  
 
PostPosted: Fri Mar 05, 2010 7:53 am 
Offline


Wed Oct 13, 2004 7:26 am

347
jcmarcos wrote:
corto wrote:
markos wrote:
~4.6x faster...[/b]

No comments, apart from one: if NEON is that good -and I think it is-, I don't think I'll miss AltiVec and PowerPC.


I didn't expect NEON was so good ...


Well, that's progress. You can't be the best always. Modern machines have to be better than older ones. By the way, great to see these progresses Konstantinos!


Altivec is still king though, check these results on the G4:

Scalar:
$ ./bench_gemm
eigen cpu 2.65264s 0.809565 GFLOPS (13.283s)
eigen real 2.6532s 0.809394 GFLOPS (13.2863s)

Altivec:
$ ./bench_gemm
eigen cpu 1.17936s 1.82088 GFLOPS (5.90097s)
eigen real 1.17959s 1.82054 GFLOPS (5.90304s)

But have in mind that PowerPC support is much better and more mature than for ARM (esp. wrt NEON) and that PowerPC is slightly faster at 1Ghz. Theoritically the G4 can do 4GFLOPS at fp math and the iMX515 can do 1.6GFLOPS.


Top
 Profile  
 
PostPosted: Fri Mar 05, 2010 10:00 am 
Offline


Mon Jan 08, 2007 3:40 am

195

Pinto, Madrid, Spain
markos wrote:
PowerPC is slightly faster at 1Ghz


Yes, but ARM is smarter, because it always sucks less electrons. Or am I wrong?

Have you seen that initiative, to build new high power ARM CPUs that are NOT targetted at mobile computers? What will happen when they free ("take off the handcuffs") these processor from the power restrictions they've always had?


Top
 Profile  
 
PostPosted: Fri Mar 05, 2010 10:29 am 
Offline


Wed Oct 13, 2004 7:26 am

347
jcmarcos wrote:
Yes, but ARM is smarter, because it always sucks less electrons. Or am I wrong?

Have you seen that initiative, to build new high power ARM CPUs that are NOT targetted at mobile computers? What will happen when they free ("take off the handcuffs") these processor from the power restrictions they've always had?


I have remote access to a prototype quad-core ARM Cortex A9 :-P


Top
 Profile  
 
PostPosted: Sun Mar 07, 2010 3:46 pm 
Offline


Sat Oct 27, 2007 12:18 pm

26

Grenoble, France
jcmarcos wrote:
markos wrote:
PowerPC is slightly faster at 1Ghz


Yes, but ARM is smarter, because it always sucks less electrons. Or am I wrong?

Have you seen that initiative, to build new high power ARM CPUs that are NOT targetted at mobile computers? What will happen when they free ("take off the handcuffs") these processor from the power restrictions they've always had?


Interesting link, thanks. Markos, you are lucky because not so many people can see / use a Cortex-A9 these days even it was announced years ago (at least 2 years).

jcmarcos : I liked very much ARM because it was small, efficient, easy to play with ... But with years, they add many things that were not planned and it is sometimes ugly in my opinion. I am afraid to see it takes the same way x86 did. But some features are great and it works well.

I work on ARM every day and I sometimes play with low level things.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Mar 09, 2010 11:35 am 
Offline


Tue Mar 09, 2010 10:41 am

19
> the iMX515 can do 1.6GFLOPS

The NEON Pipeline has 4 Single Precision FP Multiply Units and 4 Accumulators ... it could handle 4 Floats/Cycle.

So shouldn't this be 3.2 GFLOPS or am I missing something?


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 22 posts ]  Go to page 1, 2  Next

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group