I tend to agree with this assessment. UARTs are more appropriate for what you are trying to do. It doesn’t mean there aren’t ways to cram a square peg into a round hole, but the UART is the more-correct way of doing this. Bidirectional coms is what UARTs do and while SPI is capable of this.
But simply saying use the right tool for the right job doesn’t give the whole picture. It’s also valuable to have a clearer picture as to why the SPI slave mode isn’t working better. I don’t know the answers to this one, but I have some suspicions. The following are my thoughts on what it MIGHT be, not declaration of what is…
Usually the way asynchronous coms works is the receive-side needs dedicated hardware to always be listening for incoming messages. It then needs the ability to collect those messages and queue them for retrieval by the processor, and ultimately the application the processor is running that is to receive the incoming data. In the case of UARTs, the UART hardware peripheral built into the microcontroller, is that special hardware. UARTs can work alone or in combination with Direct Memory Access (DMA) and a RAM-resident ring buffer. However in rare cases where inter-character delay measurements are required by the protocol you are speaking, DMA is not an option. In those cases, each character has to be received directly off the UART and time-stamped so an inter-character time delta can be calculated for each byte received. As a result, you burden the processor far greater by having to poll the UART much more frequently to make sure you catch every incoming character. In most cases of serial coms, DMA is the way to go so multiple incoming characters can be buffered with fewer interruptions on the processor to deal with them.
If SPI Slave mode isn’t well supported on the AM335x, then it might be the SPI peripherals are not sophisticated enough to buffer incoming data at all or they can only hold 1 byte before they over-run. Or maybe the shortcomings are in your Linux distro’s SPI slave driver implementation. Again, I don’t know.
But one thing is for sure, the master is pushing data to the SPI at too high of a data-rate. UARTs typically work in the 9600, 38.4k, 56k, and 115.2k baud ranges. I’ve gotten success with 230.4k baud on the AM3352…and doing this using a non-DMA’d UART where I did have to do inter-character delay calculations on the receive-bytes, and it worked without missing a beat for weeks at a time. However if your master is trying to push data on the SPI as fast as the processor can execute your code, that might be the reason you are losing data and getting the corruption you see.
Doing as suggested above, adding intentional delays into your master when transitioning high and low on the clock line is essentially lowering the baud rate to give the slave more opportunity to catch the data. But if that doesn’t work, you may need to add an explicit inter-character delay on the master side to ensure the slave has more-than-adequate time. It just all depends on where the shortcomings actually are (hardware vs Linux driver).
If neither of these things works, then transitioning to using a UART should work just fine as well as giving you bi-directional coms between the BBBs if you so desire it. For very short distances, you can connect the UARTs together without a transceiver (TTL-to-TTL). However if you want to do more than a few feet reliably, you’ll want an RS-485 2-wire transceiver. I’m sure there are capes for them as well as external hardware boards from places like eBay or Adafruit. You can even buy TTL-to-Bluetooth (HC-05) modules to make the transfer wireless.