Search This Blog

Wednesday, December 10, 2014

What is UTF-8 encoding

Each character we key in is stored in database and occupies some space.
Certain characters require less space and some require more space.

Earlier ASCII (American Standard Code for Information Interchange) is used but it supports upto maximum of 128 characters. English is suitable language for ASCII. If we see other languages in the world like chinese , japanese etc. they have characters which need more space.

With UTF-8 (Unicode Transformation Format) , we can accommodate almost all characters from all languages in the world. The number 8 denotes 8 bits.

UTF-8 stores characters ranging 1 to 4 bytes long.(Note: 1byte = 8 bits)
Like for example, character 'A' requires 1 byte and some Japanese character might require 3 bytes.

Note: First 128 characters of ASCII are same as the first 128 characters of UTF-8. So ASCII is a subset of UTF-8

Even in PeopleSoft installation, while creating database we will be asked for the type of database which needs to be created. Since most of the clients nowadays have global presence, they would prefer Unicode database.

We also hear about UTF-16, its nothing but accommodating more space. Usually UTF-16 format is used by Operating systems etc.

Below is the code for Telugu language
 Character Name            Character     OS X Option Code     Win XP ALT Code     Entity     Hex Entity
TELUGU LETTER A          అ          Option+0C05               ALT+3077                   అ    &#x0C05
TELUGU LETTER AA       ఆ          Option+0C06               ALT+3078                   ఆ    &#x0C06
Below is the code for Indian Rupee.
Char     Dec     Hex     Entity     Name
₹    8377    20B9         INDIAN RUPEE SIGN

 UTF-8 encoding for Telugu language &#3108 &#3142 &#3122 &#3137 &#3095 &#3137
 UTF-8 encoding for English language &#84 &#69 &#76 &#85 &#71 &#85

Below is the HTML version where above codes are converted(encoded) with UTF-8 format.
 
UTF-8 encoding for Telugu language తెలుగు
UTF-8 encoding for English language TELUGU

No comments: