Text Data Compression

TEXT DATA The human-readable linguistic character sets are converted to binary for being processed by the CPU, which means all the characters that humans are converted to base 2. The character encoding can be hexa-based but in the end, all that data will be converted to base 2 binary format. Text data compression is lossless in nature.

A-Z are 8-bit binary combinations, which are passed as binary data via translations, these character sets in binary format are all hard coded in the kernel of the operating system.

All the characters that are used be it icons, symbols, canvas, or SVG, are bit-coded and are processed as bit streams by the CPU. For computer processing, the CPU will convert all the TEXT DATA (human-readable format) into a long binary string which will be processed by the CPU.

In video and image compression we utilize the LOSSY COMPRESSION ALGORITHM. Compression in text data can only be LOSSLESS COMPRESSION. Hence the TEXT DATA is compressed via leveraging HUFFMAN CODING which renders a HUFFMAN TREE which reduces the TEXT DATA SIZE without losing any DATA.

Text is alphanumeric human readable characters that abide by ASCII and UTF ENCODING standards.

Text Data ---- > Compression Algo ---> REDUCE TEXT VOLUME (data size reduction)

Text compression is lossless compression which keeps the ORIGINAL DATA perfectly reconstructed, without any information loss. Lossless compression of TEXTUAL DATA is based on the fact that (REAL WORLD DATA) exhibits (Statistical Redundancy)

Lossless compression is leveraged by ZIP and the GNU tool GZIP.

TEXT DATA ----> Compression ---> De-compression -> Original data

Hence the TEXT COMPRESSION is lossless any loss of data during compression will result in faulty data.

Diagram

The article above is rendered by integrating outputs of 1 HUMAN AGENT & 3 AI AGENTS, an amalgamation of HGI and AI to serve technology education globally.

(Article By : Himanshu N)