Article Index

How Fast?

A fundamental question that you have to answer for any processor intended for use in embedded or IoT projects is  - how fast can the GPIO lines work? 

Some times the answer isn't of too much concern because what you want to do only works relatively slowly. Any application that is happy with response times in the tens of millisecond region will generally work with almost any processor. However if you want to implement custom protocols or anything that needs microsecond responses the question is much more important. 

It is fairly easy to find out how fast a single GPIO line can be used if you have a logic analyzer or oscilloscope. All you have to do is run the program:

bcm2835_gpio_fsel(RPI_BPLUS_GPIO_J8_07, BCM2835_GPIO_FSEL_OUTP);
for(;;) {
 bcm2835_gpio_write(RPI_BPLUS_GPIO_J8_07, HIGH);
 bcm2835_gpio_write(RPI_BPLUS_GPIO_J8_07, LOW);
}

What you will see depends on which version of the Pi you are working with. 

The pulse widths are:

 Pi

pulse width

(micro seconds) 

 
Zero 0.0625 to 0.125  
1    
2 0.0625 to 0.125  
3    

 

The pulses generated are not very even - a few short 0.0625 pulses and one longer 0.125 microsecond pulse. This is presumably due to the way the CPU executes the instructions.It seems that the 0.124 pulse is generated by a 0.0625 pulse being missed every 1.56 microseconds. 

You will also see a regular interruption to the pulse train every 0.125ms (Pi 2) and few much bigger interruptions every few milliseconds depending on the loading of the processor. More about the way CPU loading effects program in the chapter on Near Real Time Linux.

it also doesn't make any difference if you use set and clr or any of the other functions. You can also use a while(1) loop in place of the for loop - it makes no difference to the overall timing. 

So you can generate nano-second pulses using the Pi but not very accurately. For accuracy you have to move to the microsecond range which is usually sufficient for most applications. 

usleep & delayMicroseconds

To generate pulses of a known duration we need to pause the program between state changes. 

The simplest way of sleeping a thread for a number of microseconds is to use usleep - even if it is deprecated in Posix. 

To try this, include a call to usleep(10) to delay the pulse:

for(;;)
{
 bcm2835_gpio_set(RPI_BPLUS_GPIO_J8_07);
 usleep(10);
 bcm2835_gpio_clr(RPI_BPLUS_GPIO_J8_07);
 usleep(10);
}

You will discover that adding usleep(10) doesn't increase the pulse length by 10 microseconds but by around 80 microseconds. You will also discover that longer glitches have gone.

What seems to be happening is that calling usleep yields the thread to the operating system and this incurs an additional 74 microsecond penalty due to calling the scheduler. There are also losses that are dependent on the time you set to wait - usleep only promises that your thread will not restart for at least the specified time. 

If you look at how the delay time relates to the average pulse length things seem fairly simple

 

The equation of the line is approximately

pulse length = delay+74

What this means is that if you want to set a delay less than 74 microseconds don't use usleep.

The same isn't true of the delayMicroseconds function

void bcm2835_delayMicroseconds

This uses a busy wait loop for times shorter than 450 microseconds and a system delay like usleep for longer periods.

We first need to look at how a busy wait works.

Busy Wait

For pulses of less than about 100 microseconds it is better to use a busy wait i.e. a loop that does nothing.  You have to be a little careful about how you insert a loop that does nothing because optimizing compiler have a tendency to take a null loop out in an effort to make your program run faster.

To stop an optimizing from removing busy wait loops make sure you always declare loop variables as volatile.

To generate a pulse of a given length you can use 

volatile int i;
for(;;)
{
    for(i=0;i<n;i++){};
    bcm2835_gpio_write(RPI_BPLUS_GPIO_J8_07, HIGH);
    for(i=0;i<n;i++){};
    bcm2835_gpio_write(RPI_BPLUS_GPIO_J8_07, LOW);
 }

Where n is set to a value that depends on the machine it is running on. 

For the Pi 2 or Pi Zero for t greater than or equal to 0.5 microseconds you can work out n using

n = 100 * t

with t in microseconds.  Below 0.5 you can use 

n = 116*t - 9.3

but pulse length isn't reliable. 

For example if you want 1 microsecond pulses on a Pi 2 then n=100 and the result can be seen below:

 

Notice that the pulses vary from about 0.93 to 1 microsecond but this is accurate enough for most applications. 

Pi t>=0.5
Zero  100*t
2 100*t

 

Automatic Busy Wait Calibration

It is usually said that the big problem with using busy waits is that they depend on how fast the CPU is working. The other big problem with them is that they lock up the CPU and stop it from doing anything else, but this is a minor problem with a multicore processor. 

You can set constants at the start of a program to allow it to work with different versions of the Pi but it is also fairly simple to add a calibration routine. All you have to do is time how long a set number of loops take and then work out how many loops you need for a busy wait of 1 microsecond and use this to derive all other delay times. 

For example:

#include <stdio.h>
#include <stdlib.h>
#include <bcm2835.h>
#include <time.h> 

#define BILLION 1000000000L 

int main(int argc, char** argv) {
    struct timespec btime, etime;

    volatile int i;
    clock_gettime(CLOCK_REALTIME, &btime);
    for (i = 0; i < 10000000; i++) {
    };
    clock_gettime(CLOCK_REALTIME, &etime);
    double nseconds = (double) ((etime.tv_sec - btime.tv_sec)* BILLION)+
          (double) (etime.tv_nsec - btime.tv_nsec);
    int n = (int) 10 / nseconds * BILLION + 0.5;
    printf("time = %f (s)  \n \r", nseconds / BILLION);
    printf("n= %d \n\r", n);
    return (EXIT_SUCCESS);
}

If you run this on say a Pi 2 you will see that n=100 i.e. you need 100 busy wait loops to delay for one microsecond.

You can also create a function that does the same job:

int busyWaitCalibrate() {
    struct timespec btime, etime;
    volatile int i;
    clock_gettime(CLOCK_REALTIME, &btime);
    for (i = 0; i < 10000000; i++) {
    };
    clock_gettime(CLOCK_REALTIME, &etime);
    double nseconds = (double) ((etime.tv_sec - btime.tv_sec)
            * 1000000000L)+(double) (etime.tv_nsec - btime.tv_nsec);
     int n = (int) 10 / nseconds * 1000000000L + 0.5;
    return n;
} 

Note you need to include time.h to use the function i.e.

#include <time.h> 

Also notice that clock_gettime isn't part of the C standard and you have to select program type C in NetBeans to make it work.

If you are going to use busyWaitCalibrate it is a good idea to take a few samples and average them to make sure you get a sensible value for n.

delayMicroseconds

For delays of 1 microsecond up you can avoid all of the problems of calibrating your own busy wait and simply use delayMicroseconds. 

For delays of 1 to 450 microseconds this uses a busy wait with an automatic calibration. It simply sits in a loop reading the sytem clock until the required number of microseconds has gone by. Of course this isn't particularly accurate for a short wait and a hand constructed busy wait for loop can be set to create the time delay you want more accurately. 

For example on a Pi Zero:

for(;;)
{
 bcm2835_gpio_set(RPI_BPLUS_GPIO_J8_07);
 for(i=0;i<102;i++){};
 bcm2835_gpio_clr(RPI_BPLUS_GPIO_J8_07);
 for(i=0;i<102;i++){};
}

produces pulses that are measured as 0.95 to 1 microsecond.

The equivalent program using delayMicroseconds:

for(;;)
{
 bcm2835_gpio_set(RPI_BPLUS_GPIO_J8_07);
 bcm2835_delayMicroseconds(1);
 bcm2835_gpio_clr(RPI_BPLUS_GPIO_J8_07);
 bcm2835_delayMicroseconds(1);
}

produces pulses that are measured as 0.875 to 1.25. 

As you can see the simple busy wait is more accurate and more consistent than delayMicroseconds.

However as the delay increases the errors in delayMicroseconds become less important.

In practice you can generally use delayMicroseconds unless you are generating pulses less than 10 microseconds and need the accuracy.