Validation of my high-level thinking on a device design?

HI --

I have a project that is requiring me to go up a number of learning
curves and I was hoping to get some assistance from you more seasoned
folks. The device I am building is a dedicated digital audio
workstation that from a high-level user interface standpoint will
present a typical DAW multi-track audio/MIDI sequencer coupled with a
number of Ableton-Live like functions that will help in manipulating
sequenced material in a flexible and dynamic way during a
performance. While this device will be useful and fun to build in its
own right, my primary objective is to take a custom virtual machine I
have developed that is designed to be cross-platform (both cross-OS
and cross-processor) and put it through its paces.

My current high-level thinking on the device sees it including a
number of hardware components:

* a main board with a traditional intel processor and up to 8GB of
memory that will be used to execute the user interface, the intricate
object model underlying the DAW/Live Performance functionality and to
coordinate the activities across the entire device

* a separate DSP processor subsystem (I will probably use one of
HawkBoard's OMAP-L138 boards as soon as I can get my hands on it).
This will be used to execute DSP algorithms and to manipulate digital
and analog audio I/O

* a MidiBox MIDI processor (one of the STM32 Arm-based ones). This
will be used to handle all MIDI processing and input/output

* possibly (will avoid this if I can) another Intel processor to
handle execution of VSTs and other commercial plugins, if the main
Intel processor can't live up to the stringent timing requirements
involved

The overall idea is that I will do an implementation of my VM on all
these coprocessing devices and thus be able to coordinate activity
throughout the system on a distributed basis. I stress "high-level"
in my thinking since a number of the issues in actually putting
together such a configuration, powering it, and dissipating the heat
it will generate are still in front of me. Especially the hardware-
based ones; I'm a software guy and am pretty confident of my ability
to program all this once I get it set up. It is the hardware issues I
am weak on and am hoping for a bit of assistance from more experienced
souls on this forum.

In focusing on the need to establish data transfer and control
communication between the intel on one board and the OMAP-L138 on the
other hand, I have gotten as far as realizing that PCI Express is
probably the way to go with this. TO the main intel processor, my dsp
board is just like a graphics coprocessor and since PCI Express works
fine in that scenario, it also ought to work fine in my own. The
intel board I am currently looking at is Zotac's board with the LGA775
slot and the GeForce9300 chipset. This board comes with a single PCI
Express slot, and so I'm imagining that I should be putting a board in
here that has the proper interfacing to the DSP board. Having gotten
that far in my thinking, I realized that the same thing applies to the
other coprocessors I envision putting in the device -- the STM32 Arm
Midibox board and the second Intel processor (if I do indeed end up
adding one to the mix). So this single PCI board has to be able to
route data and control transfers to any of three possible targets.

Now I know at the very least I need on this board a PCI Express
component that will act as the receiving end of a PCI Express
transfer, grabbing the serial data off the incoming card pins and
translating it to a parallel stream that can be fed further on up the
pipeline. So my first question to any of you OMAP-L138 mavens is what
facilities on the OMAP-L138 should this be directed towards? Is it
the uPP high-speed parallel interface? Or is it one of the other
peripherals on the chip? The data can travel across PCIExpress at up
to a 5GB/s rate if 16 lines (or lanes, as thery are referred to in
OCIExpress-speak) are used. A further issue is what the connection
looks like between the PCI Express board and the DSP board? I'm
presuming it's a cable of some kind that feeds into one of the OMAP-
L138's connectors? But I have no idea whether I am correct about this
or if what I am shooting for is even doable. At the very least some
sort of high-level validation (or invalidation) of this general
picture would be of immense help to me at this stage.

My second question is more general and pertains to how I might take a
single PCI Express data feed and route it to one of three possible
destinations: OMAP-L138, MidiBox or auxillary Intel. What kind of
circuit needs to sit in between the PCIExpress receiving circuitry and
the various connectors that lead to the three individual target
coprocessors?

I'm approaching this all from the top down and would like to have at
least a little better conceptualization of the significant functional
pieces that need to be in this picture. Any help from you more
experienced folks would be greatly appreciated.

Thanks.

Mike