Understanding Battery Efficiency of IoT Gadgets

I’ve been a firmware engineer at two wearable firms prior to now, Pebble and Fitbit, and there was all the time one class of buyer assist tickets and person complaints that by no means went away: points round battery life. It was a continuing recreation of whack-a-mole with each new firmware model introducing a battery regression.
Battery life is important for a lot of merchandise we use in our each day lives: our cellphone, automotive, vacuum, watch, headphones, ring, mouse keyboard, and extra are all changing into battery-operated units. Though a few of us may need to maintain these items plugged in, an awesome variety of clients are demanding wi-fi and battery-operated units, so {hardware} firms are promoting them.
The best scenario is that for each new {hardware} launch, every new firmware replace, and for 99% of consumers, there aren’t any surprises round battery life.
On this submit, I’ll cowl find out how to begin interested by accumulating metrics that contribute to battery life, find out how to dig into this knowledge for particular person units, and eventually find out how to mixture all metrics from units within the discipline to precisely predict the battery lifetime of units within the discipline for a given firmware launch. All of those skills will assist tasks optimize battery life and fight points shortly after they come up, whether or not you will have ten or one million units within the discipline.
Like Interrupt? Subscribe to get our newest posts straight to your mailbox.
Why is battery life is essential
Customers of IoT units anticipate them to be set-it-and-forget-it units. As soon as connectivity is about up and the product is onboarded, that must be it aside from sometimes altering out the batteries. Batteries ought to then get replaced or recharged as sometimes as attainable, and they need to final the anticipated variety of days. At Pebble, the packaging mentioned our 130mAh watch ought to final 7 days, in order that was our goal. If a watch lasted lower than 7 days constantly, an RMA was warranted.
It’s additionally essential that the battery behaves in methods the client expects. It was one factor for the Pebble watch to linearly drop from 100% to 0% over 6 days. It was a completely totally different challenge if the watch reported 90% for 3 days then dropped to 10% and died on the 4th day.
Customers anticipate batteries to be dependable and never misinform them, however as engineers who work with {hardware} usually, we all know this isn’t the case. Batteries are exhausting, and it’s our job to make them appear dependable.
However batteries are exhausting
It’s true. Batteries make our lives a bit extra depressing for quite a lot of causes. To start out, to measure the remaining capability left in a battery, we usually solely have voltage, which is a brittle measurement.
Listed here are just some different methods during which batteries might be one of the painful components of constructing a {hardware} product.
Battery efficiency modifications with temperature
Batteries have totally different properties when performing beneath totally different temperature situations. For instance, when Li-ion batteries are working beneath chilly temperatures, efficiency is drastically diminished, as proven within the chart under.
Picture: Ricktek
Some batteries may additionally be broken from overheating and a few by charging when too chilly! They’re temperamental creatures, and it’s greatest to attempt to give them a steady working atmosphere if attainable, however that could be a pipe dream.
Fortunately at Pebble, the units we labored on have been virtually all the time on a wrist so we had a great type of temperature stability.
Batteries behave in a different way beneath load
The voltage reported by batteries might differ relying on the present draw from the batteries throughout the time of the measurement, as you may see within the chart under for Li-ion batteries.
Picture: Ricktek
It’s excellent to measure the voltage throughout recognized or anticipated quantities of present draw.
For our Pebble watches, we did two issues. First, we tried to optimize once we sampled the battery voltage to make it possible for there was no large present draw on the time of the studying. The largest present attracts have been throughout LCD backlight utilization, vibe motor occasions, and intense CPU and graphics computations (comparable to Heiko Behrens’s Intrinsic Gravelty demo).
Second, for every battery voltage studying, we reported and utilized in calculations, we sampled the voltage many instances in a brief interval and took the common of the samples. This helped filter out any noise from excessive energy attracts and low voltage readings that may have skewed our readings. In our case, the vibe motor and backlight show have been the 2 that will actually message up our voltage readings.
Not all batteries are equal
There might be each good and unhealthy batches of batteries bought from a vendor. Some may additionally be distinctive super-hero batteries, and others may barely hit the brink of the minimal Ah rankings. That’s simply how it’s, particularly when tasks are counting cents on their BOM.
Additionally, as you possible already know, batteries age over time and lose capability with the variety of cycles they undergo. At Pebble, we took this into consideration by slowly updating the battery curve over time for various revisions of the {hardware} after which yr over yr to make it possible for we tried our greatest to account for battery getting old.
Measuring energy consumption of the {hardware}
Let’s begin making an attempt to construct a mannequin for a way lengthy our {hardware} gadget will final when working on battery energy. Step one to take is to measure how a lot energy the {hardware} consumes at a base stage. That is when the gadget is working in all three main states: minimal capability, regular capability, and beneath strenuous load.
Having all three of those energy consumption profiles assist paint an image of how a lot energy this piece of {hardware} might devour, and subsequently how shortly it’d drain a battery.
Energy Profiles for every element
Step one is to create baselines for a way a lot energy every element will devour over time. Though the spec sheets from the {hardware} vendor are nice and can typically let you know energy consumption – they aren’t all the time correct, and totally different elements from totally different batches may have totally different energy utilization traits.
The final steps to carrying out this are:
- Be certain that your growth boards have energy rails with the intention to isolate lots of the elements and you may simply connect probes to them.
- Get a pleasant multimeter that may measure µA and mA.
- Write a particular firmware that devices a single element in varied energy modes, and decide the present draw for every mode. This must be the only attainable firmware and ideally solely comprise driver code.
- Rinse and repeat with each element that would have a big effect on energy consumption.
For instance, the accelerometer was a element in a wearable gadget that would devour loads of energy if left within the flawed state for intervals of time. We wrote a firmware that will set it to totally different sampling charges and file the present consumption after which used this to find out at what sampling fee we may maintain the accelerometer whereas nonetheless attaining our desired 7-day battery life.
State of Cost (SoC) over Time
As soon as we decide whether or not or not our {hardware} can efficiently meet our minimal necessities for battery life, now we have to put this to the take a look at from the opposite course. The first factor we’ll look into now could be find out how to measure the state of cost (SoC) of the battery over time to make sure that the gadget can final so long as required.
Probably the most frequent and easiest methods to measure the present capability inside a battery, or the SoC of a tool, is by measuring the voltage of the battery throughout a recognized and constant energy draw, nevertheless it’s not the one method.
Coulomb Counting with Gas Gauges
At Pebble, we had a gasoline gauge on one of many watches. A gasoline gauge is a nifty {hardware} element that may point out the battery’s SoC and well being. It could possibly perceive the battery’s present capability in two methods: measuring the voltage from the battery and Coulomb counting, which measures how a lot present passes out and in of the battery over time.
For units with massive batteries, comparable to telephones, e-mobility, automobiles, and so on., gasoline gauges are in all probability the best way to go. They’re dependable, and you may hand off the tough process of measuring the present battery’s capability to a tool that was constructed to measure it.
So with all of the reward of gasoline gauges, why spend a lot time speaking about utilizing voltage to measure a battery’s capability? As a result of at Pebble, we have been unable to make use of the gasoline gauge, because it consumes extra energy than we might have favored. There are literally thousands of merchandise on the market that may run into the identical points, particularly as persons are wanting to construct sensors and IoT units that final months and years on small batteries.
If you happen to occur to be fortunate sufficient to have the ability to use a gasoline gauge and occur to be utilizing Zephyr RTOS, positively try their new Gas Gauge API, introduced just lately in Zephyr v3.3.
State of cost with voltage
That is mostly what firms have, because it’s quite simple to get from the battery system and doesn’t devour any further energy. The issue with solely monitoring the voltage is that it’s not simply human-understandable, doesn’t improve or drop linearly, and can change beneath working situations, as talked about above.
We’d like one thing higher if attainable.
State of cost with voltage and battery curve
One factor that may be carried out to assist convert voltage to a proportion is to give you a battery curve. A battery curve is straightforward: it’s a map between a battery’s voltage and the relative proportion that possible pertains to that voltage. Merchandise additionally often have each a cost and a discharge curve.
A battery curve is what firms may have after they’ve a great understanding of their battery’s properties and sufficient knowledge to generate a curve. It’s extra simply understood by buyer assist groups and engineers that aren’t immediately concerned with the battery subsystem.
A pleasant device that I got here throughout this yr at Embedded World was Qoitech, which builds a product to assist customers construct cost and discharge curves beneath totally different environments. I imagine their product is nicely definitely worth the cash if it will possibly assist firms translate a cryptic voltage studying to a proportion that everybody can perceive.
Temporary Primer on Metrics
Earlier than delving into the next sections regarding capturing and aggregating metrics round battery life, let’s take a second to briefly talk about what a metric is. It’s important as a result of I’ve encountered firmware engineers who haven’t given them a lot consideration.
A metric is a measurement captured at runtime, and the method of mixing massive numbers of metrics and calculating statistics is named aggregation.
You may seize metrics about virtually something in your system. Widespread issues that I wish to measure in firmware are process runtimes, depend of connectivity errors and time related, peripheral utilization for energy estimation, and extra. As soon as all of those metrics are flowing out of a tool and into an information warehouse, you may uncover developments in them!
Nonetheless, capturing and aggregating metrics isn’t all the time as straightforward because it sounds. I like to recommend trying out a couple of sources if you wish to be taught extra about greatest practices for accumulating metrics.
Probably the most essential methods from the above content material is that these gadget and firmware metrics are usually despatched up in a daily heartbeat, which is distributed at a set interval, often an hour, and finally deposited into an information warehouse.
For instance, if you wish to observe BLE disconnections, you ship the variety of disconnections that occurred throughout the interval and solely that interval. Similar factor with how lengthy the BLE chip was on. Ship the variety of seconds throughout the interval it was on. By producing metrics at a set interval, it makes it trivial to carry out aggregations on them because it’s simply simple arithmetic.
Take the entire time related and divide by the variety of heartbeats, and also you get the common whole time related per heartbeat.
However we aren’t right here to speak about Bluetooth disconnections, let’s discuss energy!
Metrics that usually contribute to energy consumption
Right here’s a semi-exhaustive checklist of things that I’ve tracked prior to now that will assist paint an image for me and my colleagues about what was consuming battery life.
Connectivitiy & networking
This could cowl any kind of wi-fi radio, comparable to Wi-Fi, LoRa, Bluetooth, ZigBee, LTE, and so on.
- packets despatched & acquired
- bytes despatched & acquired
- variety of connections and disconnections
- time spent in every state (promoting, connecting)
- radio power & energy settings
- throughput
- variety of retry makes an attempt
Peripheral utilization
Right here, we attempt to measure any peripheral that may devour vital quantities of energy.
- sensor on/off time (acceleromter, gyroscope, GPS, cameras, compass, and so on.)
- actuator on/off time
- actuator whole distance
- show & backlight on/off time
- variety of show refreshes
- digital camera on/off time
- storage learn/writes/erases
CPU & Code utilization
- CPU awake, sleep, and deep-sleep time
- time spent operating every process within the system
- time operating power-hungry blocks of code
- boot time
- variety of logs written
- time spent blocked on mutexes or queues
- variety of context switches to detect thrashing
Battery metrics for a single gadget
Crucial use case of metrics is with the ability to debug particular person gadget points that come up, both internally or through buyer assist. I see most firms begin with logs to diagnose buyer points, however utilizing metrics is the place the actual worth is available in. You may see visualize and accumulate way more knowledge, particularly if the bandwidth limitations are strict (satellite tv for pc connections, LTE, and so on.)
For measuring battery life, crucial metric to seize is, in fact, the SoC of the gadget. As said above, that is usually despatched first as a voltage studying, and finally as a proportion as soon as a battery curve is adopted. With each of those plotted alongside different metrics, you may shortly and simply see what metrics contribute to battery drain.
For example, within the instance above, our battery SoC % (blue line) is dropping quickly. This may be possible attributed to the truth that the CPU is way more lively throughout this window than it usually is, and that may be associated to the variety of bytes being written to the flash chip.
Figuring out this, we will begin digging into the opposite current metrics, or including extra metrics! We should always begin capturing metrics for every module that writes to the flash, or possibly observe which duties are operating whereas the excessive CPU utilization is happening. You may in fact observe too many metrics inside a single firmware, however that quantity is actually actually excessive. With every metric solely taking over 4-8 bytes per measurement per hour, I’ve labored on firmware that captures between 50-200 metrics.
As talked about all through the article, some tasks will solely file the voltage and ship that as a metric. This works comparatively nicely when digging right into a single gadget, particularly if the interval of the battery solely lasts a couple of weeks and the metrics might be seen over all the time. It’s way more advantageous to file a proportion if attainable, so attempt to construct that battery curve!
Battery Life Metrics for a complete fleet
Making an attempt to resolve all battery issues on a per-device foundation will solely get you to date. No engineer has time to take a look at each gadget’s metrics day-after-day to grasp if battery life is getting higher or worse over time, or whether or not a brand new firmware model launched a regression or enchancment, which is why we have to mixture these battery metrics throughout a complete fleet of units.
On the fleet stage with one million units, common battery life might be very tough to find out. It may be made simpler so long as you comply with the do’s and don’ts outlined in the remainder of the article and take some inspiration from my earlier firm’s learnings.
Don’t: Report the state of cost immediately
Reporting the battery’s instantaneous voltage or proportion won’t be able to be aggregated throughout the fleet.
Think about your database receives the next knowledge factors from 4 units every hour. Observe that daring means a rise within the SoC proportion.
Gadget A | Gadget B | Gadget C | Gadget D |
75% | 23% | 92% | 5% |
72% | (lacking) | 89% | (lacking) |
67% | (lacking) | 85% | 10% |
34% | 19% | 100% | 7% |
78% | 21% | 97% | 5% |
If that is all put into the database and I needed to write SQL to find out the common battery life drop per hour for each gadget after which mixture it, I don’t suppose I’d be assured in my skills to do it, nor would I be assured that the database can compute it over for one million units for a couple of thousand knowledge factors a bit.
There are additionally a couple of different points:
- Since we’re required to calculate the deltas between each SoC studying, it means we cannot drop knowledge and all the knowledge must be acquired and processed so as. This truth alone must be sufficient to scare anybody working within the firmware business.
- What if a tool goes offline for a day or two and comes again with a wildly totally different SoC? Can we assume a charger was by no means related?
- How can we confidently know when a charger was connected?
Within the under picture, if we have been simply to report every SoC proportion knowledge level, we might not know concerning the energy bug and the next cost occasion.
In the end, we’re making an attempt to calculate the primary by-product of the battery proportion in our database. Nonetheless, this calculation is prone to lacking knowledge factors, which makes it almost unattainable. There’s a higher method.
Do: Report the delta of the state of cost
As an alternative of making an attempt to calculate the primary by-product in our database, calculate it on the gadget! Between two recognized moments in time, calculate the quantity of the battery that was depleted over the interval. There are two excellent models for this metric: a change in proportion or amps if utilizing a gasoline gauge.
I additionally extremely advise that you just standardize the interval period to make the calculation even simpler. To grasp how a lot less complicated the calculation might be, let’s work by our 4 units once more. Observe that daring means a rise within the SoC proportion.
Gadget A | Gadget B | Gadget C | Gadget D |
– | – | – | – |
-3% | (lacking) | -3% | (lacking) |
-5% | (lacking) | -4% | 6% |
-33% | -2% | 15% | -3% |
44% | 2% | -3% | -2% |
If all of those readings have been throughout a 1-hour interval (e.g. Gadget A drained 3% of its battery within the first hour), then we will simply add up all the readings during which there was not a rise within the SoC, and we’ll get one thing round 6% battery drain on common per hour.
It’s that easy. The identical logic and methodology might be utilized if the gadget is utilizing a gasoline gauge. Report the quantity of Coulombs consumed per hour, take the common, and that’s how a lot present is consumed per hour.
Right here is a straightforward code snippet of what I’d think about is the primary iteration of this in my C firmware.
static void prv_device_metrics_flush_callback(bool is_flushing) {
static int32_t s_prev_battery_pct;
if (is_flushing) {
// Finish of heartbeat interval
const int32_t current_battery_pct = battery_get_pct();
const int32_t battery_delta = current_battery_pct - s_prev_battery_pct;
device_metrics_set(kDeviceMetricId_BatteryLifeDrain, battery_delta);
} else {
// Begin of heartbeat interval
s_prev_battery_pct = battery_get_pct();
}
}
This technique might be utilized to a fleet of 1 gadget or a big fleet of hundreds of thousands. It labored for us at Pebble! My favourite half about this methodology is if you find yourself given a delta and a period of time, it’s trivial to calculate the anticipated battery life.
We have been capable of decide our anticipated battery life fairly precisely throughout inner testing and solely with about 1-2 days of testing if everybody on the firm wore the watch (24 hours * 100 individuals is 2,400 knowledge factors if the battery is measured hourly).
Do observe that each SoC and SoC delta must be reported. The primary is helpful for the per-device knowledge, and the latter is helpful for fleet-wide aggregations.
Do: Drop heartbeats with a battery cost occasion
Discover within the desk within the earlier part, there have been instances when some units had their SoC % improve (famous in daring). That’s as a result of a charger was related throughout that interval. We additionally ignored them when computing the common battery drain. This was important as a result of we solely need to add up the intervals during which the gadget was working usually and on battery energy.
As an alternative of ignoring these on the server, I’d extremely recommend dropping the metric and never sending it in any respect, or one way or the other marking any SoC delta metric with a observe {that a} charger was related. This can allow the SQL wizard to simply ignore these occasions within the last calculations.
The factor to notice is that dropping a couple of knowledge factors right here and there finally doesn’t matter a lot. When hundreds of units are reporting knowledge each hour, a couple of dropped hours right here and there don’t meaningfully change the averages.
A extra superior code snippet that ignores battery charger occasions might be present in Memfault’s documentation.
Evaluating battery life throughout software program variations
For many {hardware} firms, the {hardware} is a recognized amount. It’s largely designed in-house and solely has a few revisions. If a single firmware was operating on the gadget and the software program was by no means up to date, it might possible devour the identical quantity of energy, on common, every day of its lifespan. That’s one factor nice about firmware.
However that isn’t how the Web of Issues works. IoT units get up to date with new firmware all the time. Software program is what makes right this moment’s {hardware} firms distinctive and useful, so firmware updates are important. Nonetheless, with firmware updates come regressions and the flexibility to actually screw issues up, and the firmware tasks I’ve labored on have shipped extra regressions than we will depend.
When sending metrics, make sure to connect a firmware model to every one in all them (as talked about within the heartbeat metrics article). The firmware model must be your major dimension on how you identify if battery life is getting higher or worse.
Probably the most demanding instances at a {hardware} firm is a firmware replace rollout as a result of it might be the replace that bricks hundreds of units or causes an enormous battery life regression. To mitigate the danger, accumulate knowledge in any respect phases of the discharge and continuously take a look at the information. Even with a couple of thousand samples, you must be capable of make knowledge pushed selections and decrease the demanding deployments.
Finest Practices
All through the final ten or so years doing firmware growth for {hardware} units and speaking to tons of builders doing the identical, listed here are a couple of greatest practices that I’d encourage each growth staff to undertake.
Don’t assume what labored for 1-100 units works for fleets of 10k+ units
I’ve talked to loads of builders and groups, and all of us at Memfault have talked to lots extra, and the one resounding factor we hear and perceive is that after the variety of units crosses into the hundreds, early knowledge programs begin to break down or turn into prohibitively costly.
Listed here are a couple of frequent normal issues I’ve seen rather a lot prior to now fail:
- Producing metrics from logs: It’s straightforward to fall into this entice as a result of it’s seemingly straightforward. Early tasks implement logs, ship them to S3, after which begin constructing scripts to parse them. Nonetheless, past a sure variety of units, this turns into a big knowledge downside and the brittle nature of logs makes these programs exhausting to make use of and keep. Generate the metrics on the gadget.
- Utilizing Grafana for fleet-wide aggregations: Prometheus, StatsD, Grafana, and related instruments weren’t designed to observe large quantities of units. Quite, they have been optimized for a handful of servers and companies monitoring well being metrics despatched each few seconds. They’re designed to trace entities in real-time, not present massive historic analytical queries throughout many dimensions which are required to really perceive battery life. Actually, actually suppose twice earlier than considering Grafana is a one-stop store for monitoring your units.
- Sending random metrics and assuming you’ll make sense of them later: I’ve seen this time and time once more. If a metric doesn’t observe one thing helpful or doesn’t have a transparent denominator to construct aggregates on, it received’t magically turn into helpful and must be eliminated. It’s rubbish in, rubbish out. Because of this I closely recommend tasks undertake some patterns of heartbeat metrics. They’ve labored extremely nicely for me prior to now, and are virtually fool-proof towards the problems confronted within the embedded firmware world.
Implement monitoring in the beginning of a challenge
I as soon as believed that OTA & monitoring have been a few of the final items that you just wanted to place in place earlier than transport {hardware} to manufacturing. I now know that is the fully flawed method.
Get the {hardware} working at a minimal stage first, then implement OTA, then construct up a monitoring pipeline, and eventually begin engaged on actual options that your finish gadget ought to assist.
That is the best way we had carried out issues at Pebble, and it was unimaginable. For each new watch developed after our first era, it was bring-up, logs, metrics, coredumps, then constructing the core options on this basis. We have been so productive in these early months of creating a brand new product!
And naturally, we had battery metrics being despatched continuously as nicely. If the battery life plummeted on my watch throughout inner testing, I filed an inner assist ticket and dug into the information to assist repair the bug.
If we had not had our monitoring stack arrange on the very starting and we as a substitute waited till simply earlier than manufacturing to set it up, I don’t suppose we ever would have shipped on time and we’d have needed to reduce loads of options and be much less bold.
Check internally. So much.
Get as many hours reporting into your dataset as attainable. Ensure persons are utilizing their units actively, and it’s a minimum of much like how your clients will finally use the units as nicely.
One factor we closely took benefit of at earlier firms was our utilization of beta testing teams. At Pebble, we had followers everywhere in the world who have been greater than excited to assist us take a look at the most recent firmware releases, even when it meant their firmware was rather less steady and often-times had worse battery life.
Package deal and ship firmware as shortly as attainable
One factor we did extraordinarily nicely at Pebble was transport firmware each few days internally to our workers and each few weeks externally to our clients. The largest benefit to transport usually is each new launch that went on worker or buyer wrists had a small variety of modifications or commits.
If we launched a serious energy regression in one in all our inner builds, we’d solely have 10-20 commits to look by to guess at which one it possible was. If we launched a regression in our customer-facing construct, we’d have in all probability 100 commits or in order that we’d must git bisect although. This was painful, however nothing was unattainable.
The issue was transport each 3-6 months. In that period of time, you will have lots of if not hundreds of commits that would trigger regressions in varied subsystems, and by this level, it’s virtually assured that there isn’t a single challenge affecting the battery efficiency.
Firmware updates are a blessing that can be a curse. With the correct instruments and knowledge assortment in place, transport usually is a blessing that lets you shortly discover points and repair them shortly.
Utilizing Memfault as your monitoring stack
All of us at Memfault have thought of find out how to monitor battery life fairly extensively, and we’ve constructed a product that we might have liked to make use of at our earlier employers. Memfault can work on all kinds of embedded programs and throughout all forms of community topologies, whether or not they’re home-grown or standardized like Wi-Fi or Bluetooth.
To be taught extra about the way you may instrument battery life with Memfault, which is kind of much like this submit, try the Memfault Documentation web page, Tracking Battery Life with Memfault.
Conclusion
The world can be a greater place if every thing was plugged right into a wall socket. However that is changing into much less and fewer true every day. And as a shopper, I like it. I can dance freely and leap over couches whereas vacuuming the home with my wi-fi Dyson and Bluetooth headphones, and I do know that the firmware engineers at these IoT firms are working exhausting to verify the units are dependable and have nice battery life.
I hope this text has helped both paint an image of the steps vital to construct and ship an excellent battery-operated gadget or that you just’ve realized a couple of new issues to take again to your staff to enhance the battery life in a product you’re employed on.
Like Interrupt? Subscribe to get our newest posts straight to your mailbox.
See something you need to alter? Submit a pull request or open a problem at GitHub
References