Weird behaviour with pins and PRUs

Hello,

I am a beginner in PRU and embedded programming in general so I apologize if I am missing something obvious.

I am trying to implement an audio processing algorithm (CIC filter) to run on the PRUs. To do so, I need to generate a CLK signal for the microphone, which I generate on PRU0, output from PRU0 to pin P8.11 and I then connect the mic’s CLK pin to this pin directly. However, I also need to be able to read the CLK signal from PRU1, so I figured I would plug the CLK signal generated by PRU0 to pin P8.46 which I then poll from PRU1.

Doing this gives strange results though. As soon as I connect P8.11 (output CLK) to P8.46 (input CLK), the CLK signal from P8.11 gets stuck at Vdd. If I unplug it it will remain stuck at Vdd. Even restarting both programs (the clock generating one on PRU0 and the audio processing one on PRU1) does not fix the issue. Only restarting the Beaglebone does. I checked the voltage of P8.46 and it is always at 0.

I tried running only the PRU0 CLK program and the issue does not happen, it looks like the PRU1 program causes some issues.

Here is the code for the PRU0 CLK program :

/* Code for the clock generated by PRU0 and sent to the microphone. */
.origin 0
.entrypoint TOP

#include “prudefs.hasm”

#define CYCLES 39
#define CLK_PIN r30.b1

TOP:
MOV r0, CYCLES
_LOOP:
SUB r0, r0, 1
QBNE _LOOP, r0, 0

// Toggle CLK signal
XOR CLK_PIN, CLK_PIN, 1 << 7

QBA TOP

And here is the code for the PRU1 audio program (sorry it’s a little long) :

#define PRU1_ARM_INTERRUPT 20

// Input pins offsets
#define CLK_OFFSET 1
#define DATA_OFFSET 0

// Register aliases
#define IN_PINS r31
#define SAMPLE_COUNTER r5
#define WAIT_COUNTER r6
#define TMP_REG r7
#define BYTE_COUNTER r8

#define HOST_MEM r20
// Host mem size is multiple of 8, this is ensured on the host side
#define HOST_MEM_SIZE r21
#define LOCAL_MEM r22
// Defined in page 19 of the AM335x PRU-ICSS Reference guide
#define LOCAL_MEM_ADDR 0x2000

#define INT0 r0
#define INT1 r1
#define INT2 r2
#define INT3 r3
#define LAST_INT r4

#define COMB0 r10
#define COMB1 r11
#define COMB2 r12
//#define COMB3 r13
#define LAST_COMB0 r14
#define LAST_COMB1 r15
#define LAST_COMB2 r16

// DEBUG (assumes pin P8.44)
#define SET_LED SET r30, r30, 3
#define CLR_LED CLR r30, r30, 3

.origin 0
.entrypoint TOP

TOP:
//MOV r31.b0, PRU1_ARM_INTERRUPT + 16
SET_LED
// ### Memory management ###
// Enable OCP master ports in SYSCFG register
// It is okay to use the r0 register here (which we use later too) because it merely serves as a mean to temporary hold the value of C4 + 4, the OCP masters are enabled by writing the correct data to C4
LBCO r0, C4, 4, 4
CLR r0, r0, 4
SBCO r0, C4, 4, 4
// Load the local memory address in a register
MOV LOCAL_MEM, LOCAL_MEM_ADDR
// From local memory, grab the address of the host memory (passed by the host before this program started)
LBBO HOST_MEM, LOCAL_MEM, 0, 4
// Likewise, grab the host memory length
LBBO HOST_MEM_SIZE, LOCAL_MEM, 4, 4

// ### Set up start configuration ###
// Setup counters to 0 at first
LDI SAMPLE_COUNTER, 0
LDI BYTE_COUNTER, 0
// Set all integrator and comb registers to 0 at first
LDI INT0, 0
LDI INT1, 0
LDI INT2, 0
LDI INT3, 0
LDI COMB0, 0
LDI COMB1, 0
LDI COMB2, 0
//LDI COMB3, 0
LDI LAST_INT, 0
LDI LAST_COMB0, 0
LDI LAST_COMB1, 0
LDI LAST_COMB2, 0

// ### Signal processing ###
wait_edge:
// First wait for CLK = 0
WBC IN_PINS, CLK_OFFSET
// Then wait for CLK = 1
WBS IN_PINS, CLK_OFFSET

// Wait for t_dv time, since it can be at most 125ns, we have to wait for 25 cycles
LDI WAIT_COUNTER, 12 // Because 25 = 1 + 12*2 and the loop takes 2 one-cycle ops
wait_signal:
SUB WAIT_COUNTER, WAIT_COUNTER, 1
QBNE wait_signal, WAIT_COUNTER, 0

// Retrieve data from DATA pin (only one bit!)
AND TMP_REG, IN_PINS, 1 << DATA_OFFSET
LSR TMP_REG, TMP_REG, DATA_OFFSET
// Do the integrator operations
ADD SAMPLE_COUNTER, SAMPLE_COUNTER, 1
ADD INT0, INT0, TMP_REG
ADD INT1, INT1, INT0
ADD INT2, INT2, INT1
ADD INT3, INT3, INT2

// Branch for oversampling
QBNE wait_edge, SAMPLE_COUNTER, 64

// Reset sample counter once we reach R
LDI SAMPLE_COUNTER, 0

// 4 stage comb filter
SUB COMB0, INT3, LAST_INT
SUB COMB1, COMB0, LAST_COMB0
SUB COMB2, COMB1, LAST_COMB1
SUB TMP_REG, COMB2, LAST_COMB2

// Output the result to memory
// We write one word (4 B) from TMP_REG to HOST_MEM with an offset of BYTE_COUNTER
SBBO TMP_REG, HOST_MEM, BYTE_COUNTER, 4
// Increment the written bytes counter once the write operation is done
ADD BYTE_COUNTER, BYTE_COUNTER, 4
// First, check if we are about to overrun the buffer, that is, if HOST_MEM_SIZE - BYTE_COUNTER < 4
// If yes, send an interrupt to the host, and reset the byte counter/offset back to 0
// TODO: since HOST_MEM_SIZE is a multiple of 8, maybe we could just do an equality check ?
//SUB TMP_REG, HOST_MEM_SIZE, BYTE_COUNTER
//QBGE check_half, 4, TMP_REG // Jump to “check_half” if HOST_MEM_SIZE - BYTE_COUNTER >= 4
QBNE check_half, HOST_MEM_SIZE, BYTE_COUNTER
MOV r31.b0, PRU1_ARM_INTERRUPT + 16 // Interrupt the host, TODO: could be done in a safer way by writing to the host memory which buffer we’re in
LDI BYTE_COUNTER, 0 // Reset counter/offset, which will make us write to the beginning of host memory again
QBA continue_comb

// TODO: could be done in a more efficient way, by storing the half value in a register
check_half:
// If we have filled more than half of the buffer on the host side, send an interrupt, use TMP_REG to store the value of the host buffer divided by 2, because the host side memory length is a multiple of 8, so half of it will be a multiple of 4
LSR TMP_REG, HOST_MEM_SIZE, 2
QBNE continue_comb, TMP_REG, BYTE_COUNTER
// Interrupt the host to tell him we wrote to half of the buffer
MOV r31.b0, PRU1_ARM_INTERRUPT + 16

continue_comb:
// Update LAST_INT value and LAST_COMBs
// TODO: check this is correct, and this could perhaps be done in fewer instructions
MOV LAST_INT, INT3
MOV LAST_COMB0, COMB0
MOV LAST_COMB1, COMB1
MOV LAST_COMB2, COMB2

// Branch back to wait edge
QBA wait_edge

// Interrupt the host so it knows we’re done
MOV r31.b0, PRU1_ARM_INTERRUPT + 16

HALT

If you don’t feel like reading everything, these are all the instructions that mention r31, the input pins register :

WBC IN_PINS, CLK_OFFSET

WBS IN_PINS, CLK_OFFSET

AND TMP_REG, IN_PINS, 1 << DATA_OFFSET

MOV r31.b0, PRU1_ARM_INTERRUPT + 16

I suppose the last instructions might cause problems since the bit corresponding to pin P8.46 is in r31.b0, but as far as I understand, writing to these bits does not actually change the value of the pins but only triggers an interrupt, I may be wrong though. I tried using pin P8.30 for CLK input which has an offset of 11, so it is not in r31.b0, but I still have the same problem.

I made sure to run these commands before running my programs :

config-pin -a P8.11 pruout
config-pin -a P8.46 pruin
config-pin -a P8.30 pruin

Feel free to ask for more information.

Greetings,
Loïc

How about the pin-multiplexing of your two PRU-pins? Are they really enabled as PRU in/outputs or are they still mapped to some other functions of the main core?

Yes I checked they are all enabled by calling config-pin -q on them and this showed the right configuration. I started working on the setup again today though and now it works fine, but I didn’t change anything in my code or my pinmux setup, weird…

And I agree this a bad idea because it is a huge waste of computing power. I have tried using the BBB built in PWM pin which, as far as I know, can be configured to be a 50% duty cycle, stable clk signal by following this link : … But so far as I haven’t been able to make it work because the files mentioned there are not present in my kernel (Linux beaglebone 4.4.91-ti-r133 #1 SMP Tue Oct 10 05:18:08 UTC 2017 armv7l GNU/Linux). I would like to use that PWM though, is there a way to do it on the more recent kernels ?

I could also do both operations on one PRU yes. However, since the audio processing steps involve memory writes to the host, I am worried these might not take deterministic time, which would prevent my clock my clock from being stable.

And I agree this a bad idea because it is a huge waste of computing power. I have tried using the BBB built in PWM pin which, as far as I know, can be configured to be a 50% duty cycle, stable clk signal by following this link : … But so far as I haven’t been able to make it work because the files mentioned there are not present in my kernel (Linux beaglebone 4.4.91-ti-r133 #1 SMP Tue Oct 10 05:18:08 UTC 2017 armv7l GNU/Linux). I would like to use that PWM though, is there a way to do it on the more recent kernels ?

I have no idea, I do not use the Hardware from within Linux but out of a bare metal firmware…

I could also do both operations on one PRU yes. However, since the audio processing steps involve memory writes to the host, I am worried these might not take deterministic time, which would prevent my clock my clock from being stable.

There is some shared memory available where both, PRU and main core have access to. Since PRU has the priority for accessing this ram, you can use it for a ringbuffer with defined timing which also can be read from the main core.

On one PRU, you could use the PRU internal eCAP subsystem to generate the CLK output. Synchronize your software by reading back the eCAP counter.

BR

I could yes, but I read somewhere (can’t remember where exactly, but it was in some TI PowerPoint presentation that reading the eCAP register takes at least 4 cycles, which is not ideal for my purposes. In the meantime I found a way to use the BBB built in PWM as a CLK signal and it works now. Check this link if you’re interested. Thanks for your help nonetheless.

Read or write access from PRU to the PRU internal eCAP module takes exactly one cycle. You can find this information in the PRU TRM.

Please don’t confuse other readers.