All times are UTC-06:00




Post new topic  Reply to topic  [ 15 posts ] 
Author Message
 Post subject: sse3 vs altvec
PostPosted: Fri Jun 16, 2006 2:23 am 
Offline

Joined: Sat Feb 18, 2006 4:43 am
Posts: 8
I have seen some reports stating that sse3 has finally allowed intel to catch up in terms of speed to the altvec instruction set, anyone knowlageable care to comment on this?


Top
   
 Post subject: Re: sse3 vs altvec
PostPosted: Fri Jun 16, 2006 8:37 am 
Offline

Joined: Tue Jan 31, 2006 1:18 am
Posts: 49
Location: Bialystok, Poland
Quote:
I have seen some reports stating that sse3 has finally allowed intel to catch up in terms of speed to the altvec instruction set, anyone knowlageable care to comment on this?
Don't think so. Look at the comparision of CPU performance in distributed.net contest. I think it is a good comparision because dnetc cores are very optimized for given architecture and of course all available SIMD units are used. Then look at the results:

RC5-72

The fastest x86 (single core) is Athlon64 at about 2.6 GHz, with 11.7 Mkeys/s. My poor 7447 at 1.0 GHz (Pegasos II) does 10.7 Mkeys/s. Some more reasonably clocked PPC, like 7447 at 1.7 GHz does 17.5 Mkeys/s.

OGR-25

The fastest x86 (single core) is again Athlon64 with 30 Mnodes/s at 2.6 GHz. It is significantly better than my Pegasos (22 Mnodes/s). But then again 7447 at 1,7 GHz does 39.5 Mnodes/s...

My conclusion is they still are going to catch up with PowerPC, but they haven't done yet. And I do not even want to guess dnetc througput of CELL processor ...

_________________
http://krashan.ppa.pl


Top
   
 Post subject:
PostPosted: Sat Jun 17, 2006 12:15 am 
Offline

Joined: Sat Feb 18, 2006 4:43 am
Posts: 8
Wow after looking at that it makes me wonder what the OSW will get

what i see seems to confirm what you say however i am not sure what types of optimisations the program uses but i do know that the power arcitecture is "cleaner" in my opinion it seems to me that the x86 does some things in wierd ways


Top
   
 Post subject:
PostPosted: Sat Jun 17, 2006 7:42 am 
Offline

Joined: Thu Nov 18, 2004 11:48 am
Posts: 110
The main issue with altivec is that too few people work on it and too few applications are getting optimized for it.

you may have fun with programs like jack the ripper and see the difference between some altivectorized pieces and non altivectorized ones.

Altivec is an impressive tool, quite more easier to use than any other SIMD and quite nice as results.

Please consider that g4 aren't exactly the "latest tecnology" and still something runs on them with reasonable performance.

(that said I should go back profiling h264 on ffmpeg to improve Romains code...)


Top
   
 Post subject:
PostPosted: Sat Jun 17, 2006 12:19 pm 
Offline

Joined: Fri Feb 17, 2006 12:31 pm
Posts: 10
Intels most obvious problem ist that they use onle 64 bit busses for SSE3 this limits their throughput to the half of AltiVec's throughput, but this is only the most obvious problem. Only to mention a few other:
- the lack of a permutation unit.
- high register pressure.
- 2 operand instructionformat destroys first source.

SSE3's only advantage is the support of 64 bit floats.

Nevertheless AltiVec will be beaten by SSE3 in near feature, because Freescale is the only major CPU developer using AltiVec. AltiVec ist unchanged for over 7 years now. SSE has been updated 2 times since it's introduction. In order to secure AltiVec's speed advantage I would recommend the following steps:
- increasing the register width to 512 bit.
- improving the workflow for non dsp or multimedia algorithmes by adding instructions like loads and stores takeing the offset from an AltiVec register or removing the operand size limitations of multiplications.
- support for 64 bit datatypes ( ints and floats )
- for fft's: support of complex numbers
- 3 vFPU modes: DAZ, Java, IEEE 754r ( with Stickybits in Statusregisters instead of exceptions ).
- like in some TI DSPs one bit in each AltiVec instruction indication if it has to be processed in a new cycle or in the current cycle.
- optionaly: consantant register for bitmasks, rotation counts etc. which could only be used as second source operand ( selectable by a spr linke VRSAVE ).

[Sorry posted in wrong language]

Intels Problem liegt darin das sie CPU intern nur 64 Bit Busse haben, was ihren Duchsatz schonmal auf die Hälfte reduziert. Dies ist aber nur das auffälligste Problem. Um nur ein paar der anderen zu nennen:
- keine Permutationseinheit.
- zu wenige Register.
- 2 Operandenformat zerstört die erste Quelle.

Ihr einziger Vorteil ist in meinen Augen das SSE seit SSE2 doubles mit 64Bit beherrscht.

Dennoch auf dauer wird AltiVec von SSE geschlagen werden, weil nurnoch Motorola auf AltiVec setzt. SSE ist jedoch jetzt schon in seiner 3. Version auf dem Markt. Bei AltiVec hat sich seit mindestens 7 Jahren nichts am Befehlsatz getan. Um AltiVecs Leistungsvorsprung für die nächstens Jahre sicher zustellen würde ich folgendes vorschlagen:
- Erhöhung der Registerbreite auf 512 Bit.
- Verbesserungen für die Verarbeitung von nicht Multimedia und DSP Algorithmen z.B. Speicherzugriffe mit AltiVec Registern als Offset und Multiplikationen mit mehr als 16 Bit x 16 Bit ->32 Bit.
- Unterstützung von Datentypen mit 64 Bit.
- Für FFTs: Unterstützung von Komplexen zahlen.
- 3 vFPU Modi: DAZ, Java, IEEE 754r ( Mit Stickybits in Statusregistern, keine Exceptions ).
- Wie bei einigen TI DSPs ein Bit im Befehl der angibt ob er in einem neuen Takt ausgeführt werden muss oder noch parralel zu den vorhergehenden.
- vllt. konstanten Register für Masken, Rotationsangaben etc. die nur als zweiter Operand auftauchen ( auswählbar über ein SPR ähnlich VRSAVE ).

Was habt ihr noch für vorschläge um AltiVec weiterzuentwickeln.


Top
   
 Post subject:
PostPosted: Sat Jun 17, 2006 6:38 pm 
Offline

Joined: Thu Nov 18, 2004 11:48 am
Posts: 110
Quote:
Nevertheless AltiVec will be beaten by SSE3 in near feature, because Freescale is the only major CPU developer using AltiVec. AltiVec ist unchanged for over 7 years now. SSE has been updated 2 times since it's introduction. In order to secure AltiVec's speed advantage I would recommend the following steps:
- increasing the register width to 512 bit.
too much ad int IMHO, but probably you are already thinking about 4 long double
Quote:
- improving the workflow for non dsp or multimedia algorithmes by adding instructions like loads and stores takeing the offset from an AltiVec register or removing the operand size limitations of multiplications.
A bit hard to archive w/out having other limitations, again IMHO
Quote:
- support for 64 bit datatypes ( ints and floats )
4 64bit ints could be interesting...
Quote:
- for fft's: support of complex numbers
NO, PLEASE NO, too "complex"
Quote:
- 3 vFPU modes: DAZ, Java, IEEE 754r ( with Stickybits in Statusregisters instead of exceptions ).
=_= doesn't look sane, ok, I'm not so fond of floats and java.
Quote:
- like in some TI DSPs one bit in each AltiVec instruction indication if it has to be processed in a new cycle or in the current cycle.
- optionaly: consantant register for bitmasks, rotation counts etc. which could only be used as second source operand ( selectable by a spr linke VRSAVE ).
having too much means having high prices for high complexity.

altivec is nice also because is easy to code with it.

Surely adding a MIMD extension with 32x512bit registers could be interesting.

Still it's a matter of tradeof, x86 had loads of exotic feats: few use them, lots of silicon fat in your cpu good just for wasting energy.

an altivecII extension should be simple and to the point as the previous one, having support for wider vectors and just few operators to more would be more useful and won't make our cpus pink hidrogen propelled elephants...

still, just few applications are enjoying altivec, I'd try to get more from what we have before looking for something else.


Top
   
 Post subject:
PostPosted: Sat Jun 17, 2006 10:00 pm 
Offline

Joined: Thu Feb 16, 2006 8:10 pm
Posts: 98
Quote:
The main issue with altivec is that too few people work on it and too few applications are getting optimized for it.

you may have fun with programs like jack the ripper and see the difference between some altivectorized pieces and non altivectorized ones.

Altivec is an impressive tool, quite more easier to use than any other SIMD and quite nice as results.

Please consider that g4 aren't exactly the "latest tecnology" and still something runs on them with reasonable performance.

(that said I should go back profiling h264 on ffmpeg to improve Romains code...)
i note that you place the H.264 and ffmpeg as an after thought and thats a shame.

i really hope that you and indeed all the Altivac people here (is there that many these days?) would really go to town on all the AVC/H.264 open code base so as to be able to use our G4/5 based machines for DVB encoding/decoding as a reasonable rate at the very least.

there seems to be a pure lack of will in improving all the open audio/Video code and thats a shame.

perhaps one day soon that will change, i hope so.....

for instance nothing in the open code base comes anywere close to the CoreAVC decoder ( optimised open source demo writers of old, gone commercial for AVC), and thats a big problem for even the x86 end users, the ppc users could benifit massively if the ppc AVC codebase were re-worked to maximise its potential and in turn that might boot the x86 codebase to take notice and enable improvments all round :)


Top
   
 Post subject:
PostPosted: Sun Jun 18, 2006 7:02 am 
Offline

Joined: Sat Feb 18, 2006 4:43 am
Posts: 8
Some intresting points there

i do feel however that x86 is becoming a bit to popular, all they have to do is say they are the best and everyone belives them, personally i am an ARM fan but i am really starting to like these PPC (pun intended) chips due to the huge multimedia performance they have

increseing the register width sounds great but i think thats the point where you are turning it into a graphics/CPU hybrid, and i think thats great but i can see keeping the Alnvec processor fed would start to get harder and harder (i guess thats why the bus width to ram is so huge), cant wait to see these OSW in action if they are the chip i think they are then there should be some crazy mem bus transfer capability

any one got some good Altvec tutorials they could point me to?


Top
   
 Post subject:
PostPosted: Sun Jun 18, 2006 7:05 am 
Offline

Joined: Sat Feb 18, 2006 4:43 am
Posts: 8
Just remebered that there is a retargetable libary for mathamatics that has optimisations for everything (altvec sse mmx) and many math and media libries to provide a generic acceleration frame work, wont give you the best performance but will give you a good boost, i will see if i can track it down again as it could help people integrate simd into thire code and help make Alt-vec'orising multimedia apps easier


Top
   
 Post subject:
PostPosted: Sun Jun 18, 2006 8:50 am 
Offline

Joined: Fri Feb 17, 2006 12:31 pm
Posts: 10
Quote:
Some intresting points there
increseing the register width sounds great but i think thats the point where you are turning it into a graphics/CPU hybrid, and i think thats great but i can see keeping the Alnvec processor fed would start to get harder and harder (i guess thats why the bus width to ram is so huge), cant wait to see these OSW in action if they are the chip i think they are then there should be some crazy mem bus transfer capability
Fedding AltiVec on a G4 can get really hard. The best way to solve this is imho local momory like in the Cell SPE's.


Top
   
 Post subject:
PostPosted: Sun Jun 18, 2006 11:26 am 
Offline

Joined: Thu Nov 18, 2004 11:48 am
Posts: 110
Quote:
there seems to be a pure lack of will in improving all the open audio/Video code and thats a shame.
You should first check the source before making such complaints...

h264 is one of the codecs quite well covered about altivec optimizations, I'm still not happy with it, but isn't unoptimized at all!

Isn't lack of will but lack of time.

Keep in mind that people working on ffmpeg are using their free time.

Thank you for spitting in my face.


Top
   
 Post subject:
PostPosted: Sun Jun 18, 2006 11:28 am 
Offline

Joined: Thu Nov 18, 2004 11:48 am
Posts: 110
Quote:
any one got some good Altvec tutorials they could point me to?
http://www.simdtech.org/altivec/documents/

Could be a good start.


Top
   
 Post subject:
PostPosted: Sun Jun 18, 2006 12:12 pm 
Offline

Joined: Thu Feb 16, 2006 8:10 pm
Posts: 98
Quote:
Quote:
there seems to be a pure lack of will in improving all the open audio/Video code and thats a shame.
You should first check the source before making such complaints...

h264 is one of the codecs quite well covered about altivec optimizations, I'm still not happy with it, but isn't unoptimized at all!

Isn't lack of will but lack of time.

Keep in mind that people working on ffmpeg are using their free time.

Thank you for spitting in my face.
lu_zero, that was NOT my intent, and im sorry that you beleave it was, thanks again for your good work, i mearly wanted to draw attention for these matters to become far more well known outside a few people and perhaps have far more people look and help were they are able, and you have now clarifyed that, im hopeful now that in time all things will be great.

when i say 'lack of will' it does not mean a slagging off/slap in the face, if you will, NOT by ANY stretch of the imagination, it means theres far to much other important stuff going on to expand on the work already done.

you know, like the garden needs the grass& hedges cutting, but work and family take to much of your will to get around to doing it just yet, nothing more.


Top
   
 Post subject:
PostPosted: Sun Jun 18, 2006 12:30 pm 
Offline

Joined: Fri Sep 24, 2004 1:39 am
Posts: 111
Quote:
Nevertheless AltiVec will be beaten by SSE3 in near feature, because Freescale is the only major CPU developer using AltiVec.
Absolutely not!

-IBM PPC970
-IBM/MS Waternoose Xbox 360 CPU
-STI Cell
-IBM POWER6
-P.A. Semi PWRficient


Top
   
 Post subject:
PostPosted: Sun Jun 18, 2006 1:47 pm 
Offline

Joined: Thu Mar 16, 2006 10:02 am
Posts: 32
Quote:
Intels most obvious problem ist that they use onle 64 bit busses for SSE3 this limits their throughput to the half of AltiVec's throughput...
That's right and I know the Headline is about SSE3...but i just wanted to say that it seems like Intel will introduce SSE4 with their new Core2 processor line, widening their bus to 128 bit and adding some instructions. This surely will increase (maybe in some cases double) throughput. So they seem to have learned a bit from PPC...


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic  [ 15 posts ] 

All times are UTC-06:00


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
cron
PowerDeveloper.org: Copyright © 2004-2012, Genesi USA, Inc. The Power Architecture and Power.org wordmarks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org.
All other names and trademarks used are property of their respective owners. Privacy Policy
Powered by phpBB® Forum Software © phpBB Group