Nevertheless AltiVec will be beaten by SSE3 in near feature, because Freescale is the only major CPU developer using AltiVec. AltiVec ist unchanged for over 7 years now. SSE has been updated 2 times since it's introduction. In order to secure AltiVec's speed advantage I would recommend the following steps:
- increasing the register width to 512 bit.
too much ad int IMHO, but probably you are already thinking about 4 long double
- improving the workflow for non dsp or multimedia algorithmes by adding instructions like loads and stores takeing the offset from an AltiVec register or removing the operand size limitations of multiplications.
A bit hard to archive w/out having other limitations, again IMHO
- support for 64 bit datatypes ( ints and floats )
4 64bit ints could be interesting...
- for fft's: support of complex numbers
NO, PLEASE NO, too "complex"
- 3 vFPU modes: DAZ, Java, IEEE 754r ( with Stickybits in Statusregisters instead of exceptions ).
=_= doesn't look sane, ok, I'm not so fond of floats and java.
- like in some TI DSPs one bit in each AltiVec instruction indication if it has to be processed in a new cycle or in the current cycle.
- optionaly: consantant register for bitmasks, rotation counts etc. which could only be used as second source operand ( selectable by a spr linke VRSAVE ).
having too much means having high prices for high complexity.
altivec is nice also because is easy to code with it.
Surely adding a MIMD extension with 32x512bit registers could be interesting.
Still it's a matter of tradeof, x86 had loads of exotic feats: few use them, lots of silicon fat in your cpu good just for wasting energy.
an altivecII extension should be simple and to the point as the previous one, having support for wider vectors and just few operators to more would be more useful and won't make our cpus pink hidrogen propelled elephants...
still, just few applications are enjoying altivec, I'd try to get more from what we have before looking for something else.