usage of internal "realtime" and M4 cores

so i am pretty amazed of how big a punch AM5728 packs: 2 fast CPU cores, 2 very versatile KeyStone cores (of which ill be a happy user very soon thanks to you), 2 "realtime" (whichever they are) ,and 2 Cortex-M4 cores, that i have no idea what purpose they might serve, that limited ISA wont be helpful to me, unless they can be used to drive some servos, but that is something you dont need M4s specific FPU for.

Can somebody enlighten me in my slight confusion? What pupose do M4 cores serve and what are those “realtime processing cores”?

One of those M4 modules( 2 cores ) as I recall is dedicated to GPU
processing. The other, will be similar, albeit not as efficient as the PRU
cores. It's my understanding that these FPU / M4 cores have caching
pipelines, where the PRU / M3 cores are made to have many single cycle
instructions, e.g. no pipeline cache.

The usage for this type of processor is so one can compute various "things"
without bogging down, or slowing the main system processor. Think of it
similar to how a GPU works in tandem with a CPU. Another PC analogy would
be ToE or TCP/IP offload Engine. Where the hardware offloads work from the
main processor which runs an OS.

It's actually a bit more complex than that, but you can think of the FPU /
PRU's as peripherals that can interact with various external devices,
without slowing down the main processors. These "peripherals" are also
programmable, which makes them very flexible. Meaning, you could
potentially dream up an idea, program it, then see it work first hand.

real-time, in this case means predictable, or deterministic. You can
determine how fast your application will be by counting instructions. Linux
on the other hand can be made deterministic to a point. But is not as
deterministic as dedicated hardware.

One last thing that give this type of system an advantage to just adding an
external MCU. Is these "peripherals" are tied to the main processor core
through a fast interconnect. Where adding an external MCU would require an
interface such as SPI, or I2C. Which would be much slower by comparison.

The realtime processing cores are PRU which don’t have a pipeline so each instruction executes in a single cycle. There are 4 PRUs on the AM5728, but they have limited memory so your application is quite small. The Cortex-M4 processors are similar to the Cortex-M4 micro-controllers available from Texas Instruments. They are mostly used for video compression/decompression, but you can use them to run then as bear bone using Starterware or use a RTOS like TI-RTOS for real-time application. The Cortex-M4 support larger applications compared to the PRU. The Cortex-M4 do have a 3-stage pipeline, which means it take at least 3 cycles to execute an instruction. From what I recall, the Cortex-A15 is running at 1.5GHz, the DSP run at 750MHz, the Cortex-M4 is running at 213MHz and the PRU are running at 200MHz.

To develop code on the Cortex-M4, use Code Composer Studio V7 which is available free from TI. Use a JTAG such as USB200 from Blackhawk.

Regards,
John

On Mon, 27 Feb 2017 09:34:00 -0800 (PST), MDX
<speedy1024@gmail.com> declaimed the
following:

so i am pretty amazed of how big a punch AM5728 packs: 2 fast CPU cores, 2
very versatile KeyStone cores (of which i`ll be a happy user very soon
thanks to you), 2 "realtime" (whichever they are) ,and 2 Cortex-M4 cores,
that i have no idea what purpose they might serve, that limited ISA won`t
be helpful to me, unless they can be used to drive some servos, but that is
something you don`t need M4`s specific FPU for.

Can somebody enlighten me in my slight confusion? What pupose do M4 cores
serve and what are those "realtime processing cores"?

  I suspect the "realtime processing cores" are the PRUs.

  The main purpose for the PRUs and the M4s (M->microcontroller, where
A->application processor) is to run stuff with hard timing requirements --
whereas access via the main processor is under the constraints of the OS
(especially if one is using the file-system approach to control GPIOs [open
GPIO pseudo-file, write/read a value, close pseudo-file]). OS access can
delay operations due to task swaps etc.*

  The PRU-ICSS (Programmable Realtime Unit - Industrial Communication
SubSystem), as the full name implies, is partly targeted at creating
realtime communication protocols -- though they won't let you emulate
100Mbps Ethernet <G> (if the X-15 is like the BBB, the PRUs run on a 200MHz
clock, so divide down by how many instructions it takes to handle one bit
of your protocol to determine the rate).

  The M4s likely run at a slower rate, but have a more common instruction
set (TI Tiva-C, Arduino Due/Zero, STM boards...) -- ie: compiler support.

* where were all these cheap cards when I was tasked with trying to make a
GFE Win98 laptop behave as a satellite command formatter (even with the
program running at the highest Windows priority the OS still kept doing
something every 200-300msec; which killed any chance at reliably sending
bits out the parallel port in response to an external clock on the same
port). An Arduino class board would have allowed a serial transfer of the
command data to the board, and the board could then handle the clock
response...

The Cortex-M4 processors are similar to the Cortex-M4 micro-controllers
available from Texas Instruments. They are mostly used for video
compression/decompression, but you can use them to run then as bear bone

Interesting.

What about interrupt handling? I'd expect the Cortex-A15 to have
outrageous interrupt latencies, can the M4s be used for fast interrupt
handlers?

Is RAM coherent between the A15 and the M4? What's the M4's bandwidth
to the RAM?

-- Juliusz

yeah, so looks like i was almost right in my confusion.
so, if i can prepare each instruction beforehand, can PRU`s be used for servo control? or do i have to learn thumb to use M4/M3 for that?

The Cortex-M4 processors are similar to the Cortex-M4 micro-controllers
available from Texas Instruments. They are mostly used for video
compression/decompression, but you can use them to run then as bear bone

Interesting.

What about interrupt handling? I'd expect the Cortex-A15 to have
outrageous interrupt latencies, can the M4s be used for fast interrupt
handlers?

Nothing to do with the Cortex-A15, the interrupt latency occurs because Linux disables interrupts during critical sections. If you were running bear bones code on the Cortex-A15, you would have fast interrupts. Same with Cortex-M4, interrupts are fast as long as you handle interrupts quickly and re-enable them. Using TI-RTOS, you will also have fast interrupt handling.

Is RAM coherent between the A15 and the M4? What's the M4's bandwidth
to the RAM?

You need to read Chapter 7 of the AM5728 Technical Reference Manual which shows that you can use IPU1 for general purpose application development, but IPU2 is dedicated to IVA which is for video encoders/decoders (Chapter 6). There is 32KB of L1 cache memory which is fast.

Regards,
John

yeah, so looks like i was almost right in my confusion.
so, if i can prepare each instruction beforehand, can PRU`s be used for servo control? or do i have to learn thumb to use M4/M3 for that?

Any of the 4 PRUs or the Cortex-M4 can be used for servo control. They are all more than fast enough for this purpose.

Regards,
John

On Mon, 27 Feb 2017 21:52:59 -0800 (PST), MDX
<speedy1024@gmail.com> declaimed the
following:

yeah, so looks like i was almost right in my confusion.
so, if i can prepare each instruction beforehand, can PRU`s be used for
servo control? or do i have to learn thumb to use M4/M3 for that?

  Something wrong with using C? Let the compiler figure out the optimal
thumb2 instructions for the M4.

  If I trust the TRM I just downloaded, the unit has three PWM subsystems
on board; you may not need to do low-level programming of the pulses
themselves -- just transfer the set-up information to the PWM control
registers.