Error Checking and Correction

Whenever we send data -- whether it's audio signals over a phone line, a data stream or a legal document -- to someone else, we need to know that what arrives on the other end is identical to what we sent. Similarly, whenever we store data on disk or tape, we need assurance when we retrieve it that it hasn't been altered. Accurate data is absolutely essential for computations, record keeping, transaction processing and online commerce.

Unfortunately, storing and transmitting data both involve the actions of physical entities in the real world: electrons, photons, atoms, molecules, wires, contacts and more. This means there's always some degree of uncertainty because background noise is ever present in our physical universe and might alter or corrupt any given data bit.

Error Detection

Early in the computer revolution, some powerful techniques were developed first to detect and later to correct errors in data. The most obvious, and perhaps least efficient, way to find data changes is to repeat each unit of data multiple times and then compare the copies. This method is so inefficient that it's not used for error detection -- though the same idea is used in RAID-1 (disk mirroring) for fault tolerance.

More

Computerworld
QuickStudies

The best-known error-detection method is called parity, where a single extra bit is added to each byte of data and assigned a value of 1 or 0, typically according to whether there is an even or odd number of "1" bits. The receiving system calculates what the parity bit should be and, if the result doesn't match, then we know that at least one bit has been changed, but we don't know which bit is wrong. It's also possible that the data is entirely correct and the parity bit is garbled. If two bits have been altered, however, the changes cancel out: the data will be wrong but the parity bit won't signal an error. (See Finding a 2-bit error for more details.)

Two other established error-detection techniques are checksum (add up all the bits of the entire message, document or program and produce a single sum) and cyclic redundancy check, which operates on groups of bits at a time and uses division, not addition. Checksums and CRCs are calculated before and after transmission or duplication and then compared. However, checksums and CRCs alone can't verify data integrity, since the algorithms are known and it's possible to introduce intentional changes that these methods won't detect. A more secure way would involve cryptographic hash functions, one-way mathematical operations whose use of secret encryption keys precludes making undetectable alterations.

Error Correction

Thus techniques exist that will let us find errors in our data, but then what? One way to get the right stuff is just to ask the sending party or device to resend it. If there are a lot of errors or there is a long communications path, however, such as when we're sending data halfway around this noisy planet, retransmission can dramatically slow down communications.

What's needed is a system that will find any errors and correct them automatically. It turns out that we can create such algorithms (known as error-correcting codes, which is the other phrase that ECC sometimes stands for) at any degree of precision we want, but with a trade-off in efficiency. Most of the time, we settle for codes that can detect and correct errors in one bit and detect but not correct errors in two or more bits. (A simple illustration of how this works with a one-bit error is shown in How ECC Works .)

Today, ECC is used in many different devices, from CD players to computers. Perhaps its best-known use is in special ECC RAM for servers, in which extra bits are designed directly into the dynamic RAM chips.

Looking to the future, ECC could gain increased visibility and adoption in the wireless communications market. That's because the popularity of wireless communications and the availability of wireless products are both predicted to grow considerably, but the bandwidth and throughput of wireless channels will remain considerably lower than those of hard-wired connections.

Kay is a Computerworld contributing writer in Worcester, Mass. You can contact him at russkay@charter.net.

See additional Computerworld QuickStudies

Enterprise mobility 2018: UEM is the next step
  
Shop Tech Products at Amazon