[Encoding] Binary to Text Encoding


Binary, ASCII and Text Data

Binary data is a sequence of 8 bit bytes, where each byte can have a value between 0x00 and 0xFF. In general, we can’t assume much about this data, except that any byte could potentially have any value.

ASCII data represent text as a sequence of bytes. In the ASCII system, byte values in the range 0x00 and 0x7F are used to represent English language letters (upper and lower case), numerals, punctuation symbols and various ‘control code’. Byte values above 0x80 have no well defined meaning in ASCII.

Since ASCII data is not expected to contain byte values of 0x80 or greater, it is often called 7-bit data.

Printable characters in ASCII are values in range 0x21 to 0x7E, which includes upper and lower case characters, numerals and all standard punctuations. 

Text data is ASCII data which only contains printable and whitespace characters.

Binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data into a sequence of printable characters. This is necessary when transferring data between channels or machines that do not allow binary data (such as email or NNTP) or is not 8-bit clean.

Problems with Binary data

If a system is designed to handle text data, it might make certain assumptions about that data. This can easily cause the system to fail if binary data is passed through.

Most computers store data in memory organized in 8-bit bytes. Files that contain machine-executable code and non-textual data typically contain all 256 possible 8-bit byte values. As a result, many computer programs came to rely on 7-bit text and 8-bit binary data, and would not function properly if non-ASCII characters appeared in the data. For example, if the value of the 8th bit is not preserved, the program might interpret a byte value above 127 as a flag telling it to perform some function. 

Solution to Binary Data problem

It is often desirable to be able to send non-textual data though text-based systems, such as when one might might attach an image to an email message. To accomplish this, the data is encoded, such that 8-bit byte data is encoded into 7-bit ASCII characters. Upon safe arrival at its destination, it is then decoded back to its 8-bit byte form. The process is referred to as binary-to-text encoding. 

Encoding Plain Text

Binary-to-text encoding are also used as a mechanism for encoding plain text. For example:
 - Some systems have a more limited character set they can handle; some cannot handle every printable ASCII character.
 - Some systems have limits on number of characters per line, such as the “1000 characters per line” limit of some SMTP software, as allowed by RFC 2821.

By using a binary-to-text encoding on messages that are already plain text, then decoding on the other end, one can make such systems appear to be completely transparent. This is sometimes referred to as ‘ASCII armoring’.

For example, the ViewState component of ASP.NET uses base64 encoding to safely transmit text via HTTP POST, in order to avoid delimiter collision. 

Encoding Standards

Most of these encodings generate text containing only a subset of all ASCII printable characters. For example, the base64 encoding contains text that contains upper and lower case letters (A-Z, a-z), numerals (0-9) and the “+”, “/“ and “=“ symbols.

Some encodings (original version of BinHex) use 4 bits instead of 6, mapping all possible sequence 4 bits onto the 16 standard hexadecimal digits. Using 4 bits per encoded character leads to a 50% longer output than base64, but simplifies encoding and decoding.

Examples


  • Ascii85,
  • Base16,
  • Base32,
  • Base64,
  • Btoa,
  • MIME,
  • RC 1751

(From https://en.wikipedia.org/wiki/Binary-to-text_encoding#Encoding_standards)

Resources




Comments

Popular posts from this blog

[Redis] Redis Cluster vs Redis Sentinel

[Unit Testing] Test Doubles (Stubs, Mocks....etc)

[Node.js] Pending HTTP requests lead to unresponsive nodeJS