SpeakUp – WiFi Log-on over Audio

Why does it have to be so hard to set up WiFi on a “thing” ?

Consumer electronics is going connected and WiFi is the most common network. Bathroom scales, baby monitors, security cameras, connected kettles and educational robots are all connecting via WiFi and I’ve lost track of the number of WiFi devices connected to our network – it’s certainly over fifty. But there’s a catch, almost without exception getting these things onto your WiFi is a pain.

Most “things” don’t have a display or keyboard and rely on either Bluetooth (which tends to be fiddly) or a temporary WiFi network which you have to sign onto in order to pass the credentials of your own network to the thing. Even getting most TVs onto WiFi involves some annoying pecking around with a remote control to manipulate an on-screen keyboard.

But it doesn’t have to be this way!

WiFi Credentials over Audio

The purpose of this post is to introduce an idea I’ve got working which involves using computer generated audio in a browser to send WiFi credentials to a thing. Basically this is old-school modem tech applied to this issue and I was recommended to this path by my friend Ko Yiu Fai who knows a thing or two about digital comms. The video above shows it in action. All the code required on the sending side runs entirely in the browser (app or desktop versions could be created readily) and the code in the “thing” receives the sounds and turns them back into WiFi sign-on details.

I’ve been careful to make the code in the “thing” very lightweight – all arithmetic is integer and the device just needs a MEMS or similar microphone (available for around 20 cents in volume) and the ability to sample the microphone input at 8,000 samples per second. Currently it is written for the ESP32 which is a commonly used embedded computer with WiFi capability. The ESP32 module in the video is a HELTEC OLED Board.

Other Good Ideas have Snags

Stepping back a little, I first started looking at this challenge while talking to Robotical.io who have a great little educational robot called Marty – please take a look – they would make a wonderful Christmas present for anyone into STEM subjects! They have a good way to get Marty up and running quickly in classroom situations but there are some cases, for instance where a tutor wants to take a Marty from student to student, where getting onto home WiFi super quickly would be a boon.

A couple of really good ideas have emerged in the past to ease this pain. The first I became aware of some years ago is a very cunning scheme called BlinkUp which has been patented by the clever folks at Electric Imp and uses light communication to pass WiFi credentials. An app on your phone turns the whole of the phone’s screen in a single light-source which blinks in a sequence which, when held near to the device, transfers the WiFi credentials and allows connection without any other steps involved. Genius! It’s only used (as far as I’m aware) on things that contain the Electric Imp such as the iKettle – go on – you know you need remotely controllable coffee 🙂 And the presence of that patent is unfortunately enough to mean that the pain isn’t likely to be resolved by this advance.

An even cleverer method created by Texas Instruments and called SmartConfig uses aspects of WiFi communication which are not encrypted to send information that was never intended to be carried in that fashion. This link is an explanation of how it works and it really is rather cunning. Unfortunately there is a rider on the web-page which states that it isn’t certain to work in all circumstances and I guess that makes it a bit less brilliant than first thought.

Since the early days of computing there have been systems to transfer data over analog telephone lines. The old-school 300 baud telephone modems that emerged in the 1960’s revolutionized the task of transmitting news reports and communicating with a stock exchange. But more recently there have been some developments to enable audio communication of data with just a speaker and a microphone. A good example is an audio API called Chirp and the company behind this has recently blogged about allowing Microsoft Azure IoT devices to use their technology to get onto WiFi. But unfortunately Chirp is another proprietary technology and incorporating this into a product involves a royalty payment to Chirp.

SpeakUp is Open-Source

So, with all this in mind, I set about creating my own open-source method to allow WiFi credentials to be transferred over audio without any royalties. The principle I’ve used is based on the old, slow modems but since the WiFi credentials are really short, speed doesn’t matter all that much.

SpeakUp Binary FSK Modem Waveform

The waveform shown above is data collected from the microphone in my testing. The blue line is the raw samples which you recognize has two distinct frequencies (which are actually 1KHz and 2KHz in my version of this modem). The orange line is the output envelope of a low-pass filter which passes the lower of the two frequencies and rejects the upper one. The yellow and grey lines are peak and trough detection (max and min value filters) on the this envelope – these allow conversion into 0 and 1 values.

In order to simplify detection of the communication speed (which might vary a little between devices due to differences in sampling rates) I have used Manchester encoding which is a rather neat way to ensure that there are only two types of “symbol”:

an up followed by a down
a down followed by an up

In each of these cases there is a transition in the middle of the symbol which can be used to recover the timing. So looking at the waveform shown above, it starts with a up/down symbol, then another up/down symbol, then a down/up symbol followed by another up/down. What this means is that for any combination of bits to be encoded there are only two kinds of edge (i.e. the period when the recovered signal is continuously high or continuously low) – a short one and a long one – with the long one being exactly twice as long as the short one. This can be seen in the waveform above.

Sending – Browser Code

The code that sends the WiFi credentials is written in Javascript. The form visible in the video simply takes the two pieces of information (SSID and password) and adds them into a short JSON string of the form:

{"s":"the-ssid-from-the-form","p":"the-password-from-the-form"}

I have then implemented a simple HDLC protocol which provides start/end framing of information, a Cyclic Redundancy Check (CRC) on the data validity and “bit-stuffing” which ensures that a sequence of too many 1’s doesn’t occur consecutively which would disrupt the ability to recover the clock.

The output from the HDLC encoder is then converted to a frequency modulated (FSK) signal by generating appropriate high and low frequency sine-waves for 1 and 0 levels respectively.

Finally this waveform data is encoded into a WAV format audio buffer and that is then played as a sound using the Web Audio API supported by modern browsers.

Receiving – ESP32 Code

The receiving code has several jobs to do:

Sample the signal from the microphone quickly enough to avoid aliasing (I’ve chosen 8,000 samples per second which is 4x the maximum frequency of the signal)

Filter that signal (I actually use a high-pass filter now which is contrary to the waveform shown above). It doesn’t make much difference but I found that the combination of small speaker (I’m using an X-mini battery powered speaker now but previously had it working on my laptop speaker) and the microphone I’m using (Sparkfun INMP401 / ADMP401) results in a much higher transmission of the 2KHz signal than 1KHz – hence suppressing the 2KHz signal takes a sharper filter which is undesirable for timing purposes as it is slower to respond.
Track the minimum and maximum values of the signal over time by following the amplitude of the envelope of the waveform. The use of Manchester encoding ensures that there is never a run of all high or all low frequencies so it isn’t too difficult to track this efficiently.
Vote on the current signal being a 1 or 0 by applying the rule that the voted level can only be called a 1 if three consecutive 1s are detected and the same for 0s.
Recover the clock from the signal. In the end I did this simply by counting the samples along an “edge” and allowing the count to be used in an averaging process if it is close to the expected minimum or maximum edge length. Since Manchester encoding ensures there are only two lengths of edges (as described above) it isn’t difficult to find the average for the long-edges which is exactly the same as the symbol-rate in samples.
Reassemble the received bits into an HDLC packet obeying the rules for removal of stuffed bits and checking the CRC received matches a new CRC calculated on the received data.

If all of this works out ok then a new frame is received which should correspond exactly to the JSON data that the sender sent. All that remains for the ESP32 code to do is to use this to connect to the WiFi network.

You may notice in the source that I have written modulation code for the ESP32 as well as I had considered two way communication but that isn’t needed in this application.

Source code

The source code is here.

Have fun with it and let me know in the comments if you make use of it!