Today seems to have been mostly about porting. I’ve not really developed anything new, but I have learned a lot by moving things around between machines.
Yesterday I was a little disappointed with the un-tuned performance of the FFT code I had found and got running on the Freescale Freedom KL25Z, so I decided to givce it a go on some other platforms. The first step was to move it to my development PC to see how that stacks up.
The code needed a few changes – using an operating system timer rather than a hardware one, and sending output to the console rather than a serial port. The FFT code itself needed no changes, though.
#include#include #include #include #include #include "FFT.h" #define POINTS 1024 #define BKSP 8 #define SCALE 4 const double PI = 3.14159265358979323846; const double CIRCLE = 2 * PI; float data[POINTS]; float height(int i) { return fabs(data[i]); } clock_t when() { struct timeval tv; gettimeofday(&tv, NULL); return (tv.tv_sec % 1000) * 1000000L + tv.tv_usec; } void run(float freq) { float step = freq * CIRCLE / POINTS; for (int i = 0; i < POINTS; ++i) { data[i] = (sin(i * step) * SCALE * log(freq)) + SCALE; } clock_t t = when(); printf("start, t=%ld\n", t); vRealFFT(data, POINTS); t = when() - t; printf("stop, t=%ld\n", t); printf("FFT with %d points took %f seconds (%ld ticks at %ld ticks/sec)\r\n", POINTS, ((float)t) / CLOCKS_PER_SEC, t, CLOCKS_PER_SEC); float max = 0; for (int i = 0; i < POINTS/2; ++i) { if (height(i) > max) max = data[i]; } for (float level = max; level > 0; level -= max/10) { for (int i = 0; i < POINTS; i += 2) { putchar( (height(i) > level) ? '|' : ' '); } printf("\r\n"); } for (int i = 0; i < POINTS/2; ++i) { putchar('-'); } printf("\r\n"); } int main() { for (;;) { printf("Enter frequency:\r\n"); int number = 0; for (;;) { int c = getchar(); putchar(c); if (c == BKSP) { number /= 10; continue; } if (c < '0' || c > '9') { printf("\r\n"); break; } number *= 10; number += c - '0'; } if (number > 0) run(1.0 * number); } }
When I ran it, I got the expected output, but massively faster than the KL25z! (somewhere between 100-150 us to convert 1024 points, compared with over 100 ms on the KL25z).
Of course, to complete the test I had to run the same code on the Raspberry Pi. As expected it fell somewhere in the middle at about 3ms. Note though, that this test was done the “lazy way” by shipping the PC code over to a Raspbian Pi and compiling it and running it under Linux. I’m guessing it might be slightly faster if it had the whole machine to itself.
I might even try running it on the Arduino at some point. I assume that will be even slower than the KL25z, though.
Later I decided to also port the LED traffic light example from Ardiono to KL25z. Unfortunately the KL25z comes without headers. Connecting to the pcb holes is possible, but clumsy:
So I soldered some headers to allow me to plug in hook-up wires, just as I did with the Arduino. The code is slightly modified to use the mbed abstractions rather than the Arduino ones, but the basic code is identical
#include "mbed.h" DigitalOut red(PTB0); DigitalOut amber(PTB1); DigitalOut green(PTB2); int red_flag = 1; int amber_flag = 2; int green_flag = 4; void show_colour(int flags) { if (flags & red_flag) red = 1; if (flags & amber_flag) amber = 1; if (flags & green_flag) green = 1; wait(1); red = 0; amber = 0; green = 0; } int main() { while(1) { show_colour(red_flag); show_colour(red_flag | amber_flag); show_colour(green_flag); show_colour(amber_flag); } }
Sounds like you’re testing the written-for-mbed FFT code on your desktop PC and RasPi?
I wonder if you’d get different performance figures if you tried http://www.fftw.org/ on your PC and RasPi? (I’ve not used it myself, I just happened to spot a link to it on another website, and it reminded me of your bat-detector project!)