The -(an) addressing mode has of course to work that ways.
Well, it doesn't, why does it matter what the register order is to a stack dump of registers? Why can't you load registers in any order?
Of course does the order in which the register are put on the stack matter.
Please mind that there are three ways of putting variables on the stack.
A) Single moves.
B) C-style with link and framepointer.
C) Without framepointer
The CPU is designed to produce the correct result independent on the of the three ways you choose.
The 68k ensures that the stack access is always working correctly.
is the same as:
Link and movem
And the same as:
It's not exactly intuitive and it certainly does not act the same as multiple move.l instructions - sometimes you cannot just replace 4 move.l with a movem.l without reordering your data in the first place.
The CPU behaves 100% correct.
This is the concept of how the stack works.
I can understand that if you just look at movem.l without considering the concept of the stack this might look unintuitive at first for you.
On MorphOS even the 1000 MHz G4 on Peg II
was hardly getting over 200 MB/sec memory read performance on a 64bit bus.
You yourself have benchmarked higher.
No, I did not.
On MorphOS the G4-Peg2 has a maximum memory read performance
of about 220-260 MB/sec depending on your memory and firmware versions.
On Linux and using ASM Cache prefetch tricks you could get the memory read performance to close to 700 MB/sec.
As the ASM cache instructions do NOT work on MorphOS you can not reach this value on MorphOS.
If you write C code (without ASM cache instruction) your performance is limited to 260 MB both on Linux and MorphOS.
I don't think memory bandwidth is the be-all and end-all of system optimization.
Memory performance is a very good indicator for system speed.
One of the limiting factors for running "bigger" applications is the memory performance. Its not necessarily the copy performance, but the speed in which your CPU can read or write data is very important.
Especially for running Linux applications (which are often bigger) the memory performance is very important.
Remember one of the benefits of the original Amiga was the DMA access independant of the 68000 CPU. Since the 68000 only got to access the bus every other clock, it was important to let the Blitter, Paula audio etc. be able to do their work without having to be in constant coordination with the CPU.
It would be much better to hand off the CPU to do some other work like calculating a fractal, and also preserve the performance of I/O peripherals by not having them wait on the CPU which is busy doing memory copies.
I fully agree with you.
The major strength of the AMIGA design was the possibility to off-load jobs to other engines. The key was that you could do this off-loading on AMIGA quickly and for a very low overhead.
The SuperAGA chipset revives this concept again.
Powerful chipset which can be fully used for a very low overhead.
Even the slow 266Mhz-V4e-16bit is a lot faster than UAE (without Jit) on my Dual 2.2-GHz DueCore.
That's not too surprising considering UAE intends to be cycle-accurate in chipset terms.
No no, I was not referring to Chipset emulation but to 68k-CPU emulation which was set to "run fast as possible" mode. :-)
However memory access for chip-ram might still be restricted; the entire emulation is focussed on keeping a cycle-accurate Amiga chipset in there, regardless of the CPU emulation.
Don't worry, there are no chip ram accesses involved in this test.
I believe that the test is very good to show
max JIT performance of the used systems.
Perhaps it would be better to run a more proficient emulator, which can run a similar OS - unless you are running Linux under UAE (that is a scary though) as well as on the ColdFire board, it may not be a fair test (it may favour the Amiga emulation, if Linux really is as bad on Coldfire as you say :)
The test runs fully independent of the OS.
And as there are no OS calls during the test.
The performance of the underlying OS is not important for the test results with one exception only.
The exception is the memory latency benchmark.
Enabling the MMU has a noticeable negative impact on the memory latency results of course.
The test gives a very good indication of the performance of the different emulation (JIT on x86, JIT on PPC, CF native). I'll summary the result for you and I'll post the test source later.