The 10 Reasons I Ripped Out a £6k Lighting System

Back in 2012 my wife and I started a house renovation project in Edinburgh (Scotland) having moved to that great city the previous year. In our previous London abode we’d installed a wireless lighting system from a company called Rako and it worked well enough to convince us that wireless was the way to go. Now, having installed and maintained a system with over 150 wireless lighting control circuits over the past 7 years I’ve finally had to admit that my technology choice was wrong. The Z-Wave wireless lighting system I installed just doesn’t work for a home like mine. I have now ripped out almost all of the Z-Wave units I installed – which had cost me more than £6k to buy and install. Read on to see the 10 reasons I finally kicked it out and what I have now replaced it with.

My career in Wireless

Firstly let me explain that I spent most of my career in the wireless (mobile phone) industry and consider myself to have a reasonable degree of expertise in the field. The company I founded (http://www.actix.com) has been a leading specialist in planning and optimising new wireless technologies for the past 25 years and has developed solutions for measurement and optimisation of everything from Analog to 5G.

Wireless technology is behind some of the most amazing developments of the last 50 years – things like the iPhone and satellite TV. So you may find it strange that I would be doubting the ability of a leading home-automation wireless technology to do something as straightforward as turning a light on and off. But unfortunately my experience has been exactly that. 

Z-Wave – The system I chose in 2012

Back in 2012 I asked for recommendations from the people I knew who were Home Automation pioneers – not least Pilgrim Beart (the founder of the Hive product https://www.hivehome.com now owned by BG) and Quentin Stafford-Fraser (co-inventor of the webcam). The consensus around the time was that one of the wireless technologies already on the market would become the de-facto standard for domestic lighting control. That seemed pretty compelling as companies like AeoTech, Fibaro, Philips and LightwaveRF were already producing wireless lighting systems including (in the first two instances) control modules that could be retro-fitted into the pattress (wall-box) behind a light switch.

At that time there were essentially three main choices of wireless technology: Z-Wave, LightwaveRF and ZigBee. I discounted LightwaveRF as it was (still is?) very range limited and my house is an old one built almost entirely from large stone blocks (even internally) that are very good at blocking RF signals. Between Z-Wave and ZigBee there wasn’t much to choose from a wireless perspective but the decision was made for me when I looked at the manufacturer support for each technology. As mentioned above companies like AeoTech and Fibaro already had modules (devices that turn on/off or dim a circuit) on the market that I thought would be ideal for my lighting control system whereas such support for ZigBee was pretty much completely absent save a few hobbyist-level companies.

So I found a supplier (Vesternet – who have been great by the way) and started ordering Z-Wave modules to try out. Fast-forward 18 months or so and I’d installed (with the help of a great electrician called Lorenzo) over 150 circuits with Z-Wave control. Many of the modules I installed in that period were Fibaro generation 1 dimmer and switch (relay) units. They had a few quirks but I found that if I bought the right LED lamps (mainly Philips ones) and used the balun that you can buy to compliment a Fibaro dimmer then I could get (relatively) flicker-free dimming wirelessly.

Lorenzo is the guy on the roof harnessing the lightning. Picture by Rab (Ian or Yanny) Gamble.

Since Lorenzo was replacing most of the wiring in the house I managed to get him to bring a lot of the control cables together in around five separate locations. This simplified the task of adding devices and also helped me to keep track of the actual units. To this day I have a spreadsheet with around 500 lines (around 3 lines per circuit) which details all of the lighting circuits in the house, the circuit breaker they are connected to, the type of device that controls the circuit and its location. Unfortunately I wasn’t 100% successful in getting the control point of all the lighting circuits in one of these five locations and there are actually at least three units lost in the walls somewhere which respond to Z-Wave commands but don’t seem to control anything I can find!

Z-Wave Hubs

Having chosen a technology and the modules to turn things on and off (or dim them) the next step is to decide on a central hub or controller as all Z-Wave networks need one of those. At the time a popular one was called Vera and I initially opted for that. It had a reasonably good way of overcoming one of the problems of Z-Wave (adding a new module requires the hub and the module to be close together) which came in the form of a battery box that could power the hub while you moved it around to add modules (the AeoTech Z-Stick has a similar feature). Unfortunately Vera also had a number of downsides – a limited API to allow it to be controlled by other systems, a non-working backup system that forgot what modules had been added and a slow processor which meant adding multiple modules took a stupidly long time.

So began a waltz through a significant part of the available hub-landscape, taking in Fibaro’s HC2 hub, a Mac-based home automation app called Indigo, an open-source system called Domoticz and finally Home Assistant (recommended by Quentin who I mentioned previously – did I tell you he co-invented the web-cam – I did – but I guess that’s cool enough to be mentioned twice – right?). Each has their own strengths and weaknesses but ultimately a lot of the problems I found with any of these systems came down to issues with Z-Wave itself and while my sashays from one to another sometimes gave me the impression of progress they generally just shifted the shadows around and left the underlying issues unresolved.

What did prove to be significant though was the hardware on which these systems are based. The Vera and Fibaro HC2 have their own hardware platforms which are not open and this gave me quite a bit of trouble as I tried to work though the failure concentration problem that I will attempt to explicate later. The Mac-based system was good in the sense that it was based on Mac hardware which I have found to be relatively reliable. However, it was, ironically, the failure of a Mac Mini which ultimately precipitated another switch that I had been mulling due to a lack of Z-Wave diagnostics.

The systems I moved to (Domoticz and then Home Assistant) were based on Raspberry Pi hardware and that has resulted in quite a few more problems – SD card corruption due to brown-outs, reboots at unfortunate times due to me attempting to automate system updates and hardware failure initiated by some circuit breaker issues I experienced.

Reason 1: Z-Wave Pairing

The process of adding a module to a Z-Wave network is to put the hub into “include” mode and then press a button on the module while it is close to the hub. That sounds simple enough but now imagine doing this for 150 modules scattered around a house, some behind wall-plates, others in waterproof boxes, concealed in ceilings, etc.

It was quite onerous to do it once but I think I’ve probably done it on average three times for every module in the house plus quite a few more times for modules that I’ve moved around, or have failed and had to be replaced. That’s getting on for 500 include processes. And that’s just to get the module visible to the system. Once it is included you generally have to locate the specific function(s) of the module you want to use (e.g. a motion sensor might have a temperature reading, a light-level and an alarm setting in addition to the motion detection function), then you give each function a name, then you need to indicate what room it is in, then you might want to group it with other module functions (so, for instance, a mood setting in the dining room sets the light levels on five different modules via a scene or group). All of that needs to be typed in and you will generally be doing this while wandering around, so you will do it on a laptop or tablet and sometimes up a ladder or in a cupboard.

I have now gotten to the stage where I dread some modules failing. The one that controls a water-feature in the garden is in a sealed box, the one that controls an out-door heater is in a small loft area that can only be accessed by a rickety ladder, the one for the guest-room extractor fan is in a crawl space, etc. This can’t be the right way to do things can it?

When I compare this to the way IT operations used to be compared to the way dev-ops is now I think it is fair to say that we are just at the start and that things must change. It used to be normal that an IT person would have to physically sit at a computer for hour on end to install OSes, applications and security. Now it is done remotely and automatically on predominantly virtualized hardware. Furthermore, most apps are now in the cloud and the management of the servers that run them are automated to the Nth degree. Home automation and IoT have a long way to go!

Reason 2: Z-Wave Healing After Adding a Module

As mentioned a common challenge with Z-Wave is that introducing a new module to the network requires the module and hub to be close together. This actually creates a lot of work. There are two main ways to handle this:

  1. bring the hub to the module once it is in-situ and operating
  2. power the module up when it is close to the hub and then move it later

Both options are problematic if you are relying on the mesh network capabilities to extend the range of the network because the mesh works by finding routes from one module to another so that it can communicate at distance.

In the first case the hub will think that the module can be communicated directly – rather than through the mesh. In the second the module will have the wrong list of neighbors altogether.

In both cases, when a module is added, the hub always considers that it is in direct communication with the module so it has to be disabused of this before it will operate properly when the hub is returned to it’s proper place. The way to achieve this is to “heal” the network and this involves sending lots of messages around to discover which modules are neighbors of (can “hear”) each-other. This leads to the next problem

Problem 3: Scale

With a small network a “heal” is relatively quick but as the network grows it can take a lot longer and if things have moved around or aren’t powered on then there can be big problems.

With just a few modules a full network heal took only a few minutes but as I added more and more devices the heal took longer and longer and eventually it didn’t complete overnight. In the first version of Z-Wave a nightly-heal was recommended but I quickly went past the point where this was possible – well before I got to 100 modules I think.

I’ve read quite a bit about improvements made to the “heal” operation in Z-Wave Plus and, indeed I did start buying Z-Wave Plus devices to replace failed Z-Wave ones. But I haven’t been able to detect any improvement – possibly because I now have a mixed network of Z-Wave and Z-Wave Plus devices. For completeness though I did try creating yet another network with only Z-Wave Plus devices but I didn’t get far enough testing it before I gave up due to other issues.

Problem 4: Resilience and Backup

Z-Wave is a proprietary protocol and this is blamed by several suppliers of home automation software for the lack of backup capabilities on their Z-Wave network hubs. It is also exacerbated by the many different approaches used for actual Z-Wave connection to the hub. For instance Domoticz, Home Assistant and others are open-source software packages that make use of the third-party library OpenZWave (OZW) to connect to your network.

Also, the actual hardware used to connect is generally a USB dongle (AeoTech Z-Stick or ZWave.Me for instance) or the RaZberry which is a Raspberry Pi compatible add-on. To back-up each of these devices requires a different piece of software and, as far as I’m aware, can’t be done while the device is connected through OZW. So every time you change the configuration of your Z-Wave network you would need to bring down the home automation software and backup the memory on the USB dongle.

I think it is fair to say that no dev-ops person worth their salt would accept a system so likely to be prone to human error.

A further issue is that the configuration of the network (stored in non-volatile memory on the USB dongle) and the home automation software configuration (names of devices, rooms, groups, scenes, rules – for instance turning devices on and off at specific times) are stored (and backed-up) separately from each other and using different backup mechanisms. But if these backups are out-of-step then it isn’t possible to restore the full capabilities of the system. Here’s a brief explanation of why backup isn’t possible.

Problem 5: Failure Concentration and Complexity

Once upon a time I had a friend who was sceptical about wireless lighting systems. I told him that the system was run by a computer and this was when he pounced. “What if the power goes down” he said, “your computer will be off and you won’t be able to control the lights”. Of course I scoffed and pointed out that if there was no power there would be nothing to power then lights. 1-0 to me!

But I am the one who should have thought more carefully. Any network that depends on a single central hub (as Z-Wave does) has a single point of failure – which is an extension of his point I guess. Since a Z-Wave hub has a number of components, the failure of any of these can bring the network down. Over the years the main elements of the hub (Raspberry Pi – due to SD card corruption, Domoticz software – due to buggy updates and the USB stick – due to being physically damaged) have all been a problem for me.

A single software/hardware/connection failure at the hub means I can’t turn on any of the lights in the whole house. This is a huge problem for this technology!

Of course anyone familiar with Z-Wave will tell you that there is a way around this concentration issue by adding a secondary controller. But this leads me to the related problem. Complexity.

Adding a secondary controller is a non-trivial exercise and is not well supported by any of the home automation platforms I’ve tried.  The fact is that the secondary capability is really only handled at the level of basic network configuration. Ideally you would host the secondary on a different hardware and software platform in case it is a systematic issue but this results in having to set up the entire system a second time.

I’ve tried several times to set up a secondary controller and actually have the system fail-over to it but every time it has either been too complex and time-consuming or it has just failed for reasons too difficult to identify when push-came-to-shove.

Problem 6: Failure Rate

Another problem, if you aren’t tired of this yet, is the unacceptable rate of failure of modules in moderately large network. I now have around 150 modules and I think it is safe to say that I haven’t had a single month in the last five years when no modules have failed. I can’t keep up with the rate of failure. Right now there around 7 dead nodes on the network and at least three modules are indicating that they are functional but not operating correctly.

Admittedly some of this was probably caused by a whole-street power failure which occurred last month and that also caused an SD corruption issue on a Raspberry Pi. But whatever the reason, even if each module has a mean-time-between-failure (MTBF) of 100,000 hours (more than 10 years), when you combine 150 of them into a system then the MTBF for one module is actually less than one month.

So the only ways to get anywhere near an MTBF of a year or more would be to reduce the number of modules by a factor of more than 10 or to massively increase the reliability of each one.

Problem 7: Dead-Nodes

Now for another group of problems associated with modules that are either really dead, not dead but not communicating reliably, not dead but unreachable, not dead but have been off some time, etc. With any communications system there have to be some methods to handle communication or node failure which result in a degraded but still functional system. In a mesh network this is harder because a node which has failed or isn’t communicating reliably may be part of a path used to reach other nodes further away in the network.

The main way Z-Wave handles the lack of communication of a node is to gradually reduce the rate at which communication is attempted until it gives up altogether. This might work ok in a small network but in my experience the idea of giving up on a node altogether probably costs me a couple of hours a month in maintenance. Every time this happens – and it happened to about 10 nodes when we had a power failure while I was away – forces me to locate the module and physically press a button on it to wake the node and get it communicating.

I believe that if the network was setup in polling mode then this wouldn’t be a problem. But polling is a time consuming operation and for all but the smallest Z-Wave networks it is discouraged. So anyone thinking about a large-ish network should build-in time for spotting and fixing these issues and, if you don’t keep better records than me, a lot of that time will be spent wandering around trying to find the module which controls that light in the hall that used to work but now doesn’t.

Problem 8: Handling Power-Failures

The power failure I mentioned before was probably the main thing that pushed me into ditching the majority of the Z-Wave system. But a full power outage isn’t the only issue I’ve had of this kind. Several mini-outages have occurred due to circuit-breakers. These perform an important function of course and can trip the power if a person inadvertently touches a live wire or if there is water ingress into a light fitting, etc.

Since a number of our lighting circuits are outdoor we’ve had quite a few of these kinds of outage. Light fixtures such as pillar lights seem to last around five years before there is a good chance of water ingress. Up-lights set into paving probably last around the same amount of time. A set of christmas lights we put in a tree in the garden seem to work for around six months before something trips their circuit-breaker. And the lights we put under a feature-wall in the garden didn’t work for more than a week before they had to be disconnected completely as they tripped the circuit-breaker almost immediately every time they came on.

Each time this happens on a less obvious circuit, the module ends up with a “dead” indicator in Domoticz and the only way I’ve managed to revive them has been to open up the enclosure they are in and press the little button to get them talking. What a waste of time!

Problem 9: Poor Diagnostic Tools

Diagnostic tools are an important part of any technology that is more than moderately complicated. I would say that Z-Wave falls easily into the complicated bracket but the tools available are poor at best. Some of he home-automation apps have rudimentary route-map and logging tools and there are some log files that you can access from Open Z-Wave if you look hard but in general you are pretty much on your own. The tools from ZWave.me are some of the best at the lower level but their user interface and home-automation is pretty poor. There are also some proprietary tools but I paid for a couple and they are not that helpful. 

In the end a technology like Z-Wave which is marketed for the home user needs to “just-work”. If a regular user has to start connecting up test equipment or disconnecting their home automation system such that diagnostics can be run on part of it then there is a problem of maturity and quality in the current solutions.

I have found many occasions where things I thought I knew – like the routing the hub would take for a particular module – have proven incorrect – perhaps because a heal changed things unexpectedly – or because a new module was added that had a different performance characteristic. So either a technology like this “just works” or it becomes a mill-stone around the neck of the person tasked with keeping it working.

It is just such a mill-stone that I have finally ridden myself of and it feels good to no longer have the burden.

Problem 10: The Cost

Z-Wave modules are not cheap. When I started buying them back in 2012 they were about £50 for one circuit. Amazingly, in 2020 they are still around £50. This is pretty surprising to me as there is now a lot of good competition from things like the Shelly 1 which is only €9 and the Sonoff which is about £12 for two. The Shelly 2.5 can control two circuits and costs around €65 in a four pack.

Z-Wave modules are not cheap. A Fibaro or AeoTech Z-Wave module to switch one or two circuits is around £50 in the UK while a Shelly 1 WiFi module is under £10. 

This disturbs me. I don’t see much justification for the high price and all I can believe is that they haven’t yet woken up to the competition.

My New Lighting System

So, finally, we get onto what I’ve done to replace the Z-Wave system that I’ve mainly ripped out. It’s actually taken a lot of lockdown to get rid of almost everything but I’ve ended up with a predominantly WiFi-based system.

I could just have built the system using Shelly 2.5 modules and Shelly Dimmers as they are keenly priced and seem to work really well. Equally Sonoff units would probably have been fine also but they are a bit chunkier and some have to be installed in wall boxes (pattress boxes).

Problem 6 (Rate of Failure) would have still been an issue though and – as described before – even a 100,000 hour MTBF would have meant replacing one or more units every month. I also had a relatively unique situation that might help with this – as I mentioned earlier I had our fantastic electrician Lorenzo concentrate connections in around 5 places around the house. So it was possible for me to install a single piece of hardware to control many circuits.

One possibility would have been a Shelly Pro which can control 4 circuits. I tried this approach but didn’t really like it – for one thing it is based on a DIN Rail mounting system and I didn’t have the hardware installed to make this work well – I was replacing lots of individual modules remember.

So, me being me, I decided to go it alone and design my own switching units based around the ESP32 microcontroller and solid-state relays – I call this LightScader (you can see the designs here) because you can cascade up to four units together to get control of up-to 32 circuits. It isn’t all that complicated to do of course but the result that I now have seems to be a pretty reliable system. I’ve gone for quite high quality components – in an attempt to improve reliability – and it still works out much, much cheaper than Z-Wave modules. I also ended up using my own software to reprogram the Shelly 2.5 modules (but not – yet – the shelly dimmers) as it was easy to build from the same code I used for the LightScader. Hopefully one day I will find time to document the new system 🙂

In addition I have around 20 Shelly 2.5 modules and a similar number of Shelly Dimmers which are installed in wall boxes and other out-of-the-way places as well as in foot-switch boxes to control table and floor lamps.

Relative Merits of Z-Wave vs WiFi

IssueZ-WaveWiFi
PairingNode has to be close to hubNode can be anywhere in WiFi coverage
Healing after AddingA Z-Wave specific issueNo need to heal
Scale100+ network is unworkable in my experienceNetworks with over 200 WiFi nodes can work well
BackupNo solutionCan be backed-up with any standard backup system
Failure ConcentrationFailure of the hub leads to complete system failureStill an issue if the WiFi network goes down. Possible to have more than one WiFi network though and have nodes switch to another network on failure.
Failure Rate100,000 hours MTBF leads to > 1 module fail per month with 100+ modulesMy design of controller handles 16 or more circuits so failures will, hopefully, be less frequent
Dead NodesA problem unique to Z-WaveNodes never stop trying to connect as long as they are powered
Handling Power FailureMainly a problem with lower cost hardware like 1st or 2nd generation Raspberry PiMachine running Home Assistant or similar can be a linux server with redundant power supplies and an Uninterruptable Power Supply (UPS)
Poor Diagnostic ToolsZ-Wave products are immatureDiagnostics of WiFi is mature and there are a lot of good tools on the market
Cost£50 per circuitaround £15 per circuit on average for my new system

Some Potentially Useful Links

https://github.com/cdjackson/HABmin/wiki/Z-Wave-Network-Healing

https://community.openhab.org/t/extending-z-wave-with-secondary-controller-using-razberry-with-aeotec-z-stick/33670/28

https://drzwave.blog/2017/01/20/seven-habits-of-highly-effective-z-wave-networks-for-consumers/