Clik here to view.

Introduction/Problem
Longer useful life and improved reliability of products is becoming a more desirable trait. Consumers expect higher quality and more reliable electronics, appliances, and other devices on a tighter budget. Many of these applications include embedded electronics which contain on-board memory like Flash or EEPROM. As system designers know, Flash and EEPROM do not have unlimited erase/write endurance, but even so, these memories are necessary for storing data during operation and when the system is powered off. Therefore, it has become common to use wear-reduction techniques which can greatly increase embedded memory longevity. One common method of wear-reduction is called wear-leveling.
Wear-leveling
When using EEPROM in a design, it’s crucial to consider its endurance, typically rated at 100,000 cycles for MCU-embedded EEPROM and 1 million cycles for standalone EEPROM at room temperature. Designers must account for this by estimating the number of erase/write cycles over the typical lifetime of the application (sometimes called the mission profile) to determine what size of an EEPROM they need and how to allocate data within the memory.
For instance, in a commercial water metering system with four sensors for different areas of a building, each sensor generates a data packet per usage session, recording water volume, session duration, and timestamps. The data packets stored in the EEPROM are appended with updated data each time a new session occurs until the packet becomes full. Data is stored in the EEPROM until a central server requests a data pull. The system is designed to pull data frequently enough to avoid overwriting existing data within each packet. Assuming a 10-year application lifespan and an average of 400 daily packets per sensor, the total cycles per sensor will reach 1.46 million, surpassing the typical EEPROM endurance rating. To address this, you can create a software routine to spread wear out across the additional blocks (assuming you have excess space). This is called wear-leveling.
So, how is this implemented?
To implement wear-leveling for this application, you can purchase an EEPROM twice as large, allowing you to now allocate 2 blocks for each sensor (for a total of 2 million available cycles per sensor). This provides a buffer of additional cycles if needed (an extra 540 thousand cycles for each sensor in this example).
You will then need some way to know where to write new data to spread the wear. While you could write each block to its 1-million-cycle-limit before proceeding to the next, this approach may lead to premature wear if some sensors generate more data than others. If you spread the wear evenly across the EEPROM, the overall application will last longer. Figure 1 illustrates the example explained above, with four water meters sending data packets (in purple) back to the MCU across the communication bus. The data is stored in blocks within the EEPROM. Each block has a counter in the top left indicating the number of erase-write cycles it has experienced.
Image may be NSFW.
Clik here to view.
Figure 1 Commercial water metering, data packets being stored on EEPROM, EEPROM has twice as much space as required. Source: Microchip Technology
There are two major types of wear-leveling: dynamic and static. Dynamic is more basic and is best for spreading wear over a small space in the EEPROM. It will spread wear over the memory blocks whose data changes most often. It is easier to implement and requires less overhead but can result in uneven wear, which may be problematic as illustrated in Figure 2.
Image may be NSFW.
Clik here to view.
Figure 2 Dynamic wear-leveling will spread wear over the memory blocks whose data changes most often leading to a failure to spread wear evenly. Source: Microchip Technology
Static wear-leveling spreads wear over the entire EEPROM, extending the life of the entire device. It is recommended if the application can use the entire memory as storage (e.g., if you do not need some of the space to store vital, unchanging data) and will produce the highest endurance for the life of the application. However, it is more complex to implement and requires more CPU overhead.
Wear-leveling requires monitoring each memory block’s erase/write cycles and its allocation status, which can itself cause wear in non-volatile memory (NVM). There are many clever ways to handle this, but to keep things simple, let’s assume you store this information in your MCU’s RAM, which does not wear out. RAM loses data on power loss, so you will need to design a circuit around your MCU to detect the beginnings of power loss so that you will have time to transfer current register states to NVM.
The software approach to wear-leveling
In a software approach to wear-leveling, the general idea is to create an algorithm that directs the next write to the block with the least number of writes to spread the wear. In static wear-leveling, each write stores data in the least-used location that is not currently allocated for anything else. It also will swap data to a new, unused location if the number of cycles between the most-used and least-used block is too large. The number of cycles each block has been through is tracked with a counter, and when the counter reaches the maximum endurance rating, that block is assumed to have reached its expected lifetime and is retired.
Wear-leveling is an effective method for reducing wear and improving reliability. As seen in Figure 3, it allows the entire EEPROM to reach its maximum specified endurance rating as per the datasheet. Even so, there are a few possibilities for improvement. The erase/write count of each block does not represent the actual physical health of the memory but rather a rough indicator of the remaining life of that block. This means the application will not detect failures that occur before the count reaches its maximum allowable value. The application also cannot make use of 100% of the true life of each memory block.
Image may be NSFW.
Clik here to view.
Figure 3 Wear-leveling extending the life of EEPROM in application, including blocks of memory that have been retired (Red ‘X’s). Source: Microchip Technology
Because there is no way to detect physical wear out, the software will need additional checks if high reliability is required. One method is to read back the block you just wrote and compare it to the original data. This requires time on the bus, CPU overhead, and additional RAM. To detect early life failures, this readback must occur for every write, at least for some amount of time after the lifetime of the application begins. Readbacks to detect cell wear out type failures must occur every write once the number of writes begins to approach the endurance specification. Any time a readback does not occur, the user will not be able to detect any wear out and, hence, corrupted data may be used. The following software flowchart illustrates an example of static wear-leveling, including the readback and comparison necessary to ensure high-reliability.
Image may be NSFW.
Clik here to view.
Figure 4 Software flowchart illustrating static wear-leveling, including readbacks and comparisons of memory to ensure high-reliability. Source: Microchip Technology
The need to readback and compare the memory after each write can create severe limitations in performance and use of system resources. There exist some solutions to this in the market. For example, some EEPROMs include error correction, which can typically correct a single bit error out of every specified number of bytes (e.g., 4 bytes). There are different error correction schemes used in embedded memory, the most common being Hamming codes. Error correction works by including additional bits called parity bits which are calculated from the data stored in the memory. When data is read back, the internal circuit recalculates the parity bits and compares them to the parity bits that were stored. If there is a discrepancy, this indicates that an error has occurred. The pattern of the parity discrepancy can be used to pinpoint the exact location of the error. The system can then automatically correct this single bit error by flipping its value, thus restoring the integrity of the data. This helps extend the life of a memory block. However, many EEPROMs don’t give any indication that this correction operation took place. Therefore, it still doesn’t solve the problem of detecting a failure before the data is lost.
A data-driven solution to wear-leveling software
To detect true physical wear out, certain EEPROMs include a bit flag which can be read when a single-bit error in a block has been detected and corrected. This allows you to readback and check a single status register to see if ECC was invoked during the last operation. This reduces the need for readbacks of entire memory blocks to double-check results (Figure 5). When an error is determined to have occurred within the block, you can assume the block is degraded and can no longer be used, and then retire it. Because of this, you can rely on data-based feedback to know when the memory is actually worn out instead of relying on a blind counter. This essentially eliminates the need for estimating the expected lifetime of memory in your designs. This is great for systems which see vast shifts in their environments over the lifetime of the end application, like dramatic temperature and voltage variations which are common in the manufacturing, automotive and utilities industries. You can now extend the life of the memory cells all the way to true failure, potentially allowing you to use the device even longer than the datasheet endurance specification.
Image may be NSFW.
Clik here to view.
Figure 5 Wear-leveling with an EEPROM with ECC and status bit enables maximization of memory lifespan by running cells to failure, potentially increasing lifespan beyond datasheet endurance specification. Source: Microchip Technology
Microchip Technology, a semiconductor manufacturer with over 30 years of experience producing EEPROM now offers multiple devices which provide a flag to tell the user when error-correction has occurred, in turn alerting the application that a particular block of memory must be retired.
- I2C EEPROMs: 24CSM01 (1 Mbit), 24CS512 (512 Kbit), 24CS256 (256 Kbit)
- SPI EEPROMs: 25CSM04 (4 Mbit), 25CS640 (64 Kbit)
This is a data-driven approach to wear-leveling which can further extend the life of the memory beyond what standard wear-leveling can produce. It is also more reliable than classic wear-leveling because it uses actual data instead of arbitrary counts—if one block lasts longer than another, you can continue using that block until cell wear out. This can reduce time taken on the bus, CPU overhead, and required RAM which in turn can reduce power consumption and overall system performance. As shown in Figure 6, the software flow can be updated to accommodate this new status indicator.
Image may be NSFW.
Clik here to view.
Figure 6 Software flowchart illustrating a simplified static wear-leveling routine using an error correction status indicator. Source: Microchip Technology
As illustrated in the flowchart, using an error correction status (ECS) bit eliminated the need to readback data, store it in RAM, and perform a complete comparison to the data just written, free up resources and creating a conceptually simpler software flow. A data readback is still required (as the status bit is only evaluated on reads), but the data can be ignored and thrown out before simply reading the status bit, eliminating the need for additional RAM and CPU comparison overhead. The number of times the software checks the status bit will vary based on the size of the blocks defined, which in turn depend on the smallest file size the software is handling.
The following are some advantages of the ECS bit:
- Maximize EEPROM block lifespan by running cells to failure
- Option to remove full block reads to check for data corruption, freeing up time on the communication bus
- If wear-leveling is not necessary or too burdensome to the application, the ECS bit serves as a quick check of memory health, facilitating the extension of EEPROM block lifespan and helping to avoid tracking erase/write cycles
Reliability improvements with an ECS bit
Error correction implemented with a status indicator is a powerful tool for enhancing reliability and extending device life, especially when used in a wear-leveling scheme. Any improvements in reliability are highly desired in automotive, medical, and other functional safety type applications, and are welcomed by any designer seeking to create the best possible system for their application.
Image may be NSFW.
Clik here to view.Eric Moser is a senior product marketing engineer for Microchip Technology Inc. and is responsible for guiding the business strategy and marketing of multiple EEPROM and Real Time Clock product lines. Moser has 8 years of experience at Microchip, spending five years as a test engineer in the 8-bit microcontroller group. Before Microchip, Moser worked as an embedded systems engineer in various roles involving automated testbed development, electronic/mechanical prognostics, and unmanned aerial systems. Moser holds a bachelor’s degree in systems engineering from the University of Arizona.
Related Content
- Fundamentals of solid-state memory technologies in consumer electronics – Part 2: Bit errors, wear & MLC flash
- A look at Microchip’s new dsPIC33A digital signal controller
- Microchip’s acquisition meshes AI content into FPGA fabric
- Fundamentals of I3C interface communication
- Proper IC interconnects for high-speed signaling
The post Implementing enhanced wear-leveling on standalone EEPROM appeared first on EDN.