Each character we key in is stored in database and occupies some space.
Certain characters require less space and some require more space.
Earlier ASCII (American Standard Code for Information Interchange) is used but it supports upto maximum of 128 characters. English is suitable language for ASCII. If we see other languages in the world like chinese , japanese etc. they have characters which need more space.
With UTF-8 (Unicode Transformation Format) , we can accommodate almost all characters from all languages in the world. The number 8 denotes 8 bits.
UTF-8 stores characters ranging 1 to 4 bytes long.(Note: 1byte = 8 bits)
Like for example, character 'A' requires 1 byte and some Japanese character might require 3 bytes.
Note: First 128 characters of ASCII are same as the first 128 characters of UTF-8. So ASCII is a subset of UTF-8
Even in PeopleSoft installation, while creating database we will be asked for the type of database which needs to be created. Since most of the clients nowadays have global presence, they would prefer Unicode database.
We also hear about UTF-16, its nothing but accommodating more space. Usually UTF-16 format is used by Operating systems etc.
Below is the code for Telugu language
Character Name Character OS X Option Code Win XP ALT Code Entity Hex Entity
TELUGU LETTER A అ Option+0C05 ALT+3077 అ అ
TELUGU LETTER AA ఆ Option+0C06 ALT+3078 ఆ ఆ
Below is the code for Indian Rupee.
Char Dec Hex Entity Name
₹ 8377 20B9 INDIAN RUPEE SIGN
UTF-8 encoding for Telugu language త ె ల ు గ ు
UTF-8 encoding for English language T E L U G U
Below is the HTML version where above codes are converted(encoded) with UTF-8 format.
UTF-8 encoding for Telugu language తెలుగు
UTF-8 encoding for English language TELUGU
Certain characters require less space and some require more space.
Earlier ASCII (American Standard Code for Information Interchange) is used but it supports upto maximum of 128 characters. English is suitable language for ASCII. If we see other languages in the world like chinese , japanese etc. they have characters which need more space.
With UTF-8 (Unicode Transformation Format) , we can accommodate almost all characters from all languages in the world. The number 8 denotes 8 bits.
UTF-8 stores characters ranging 1 to 4 bytes long.(Note: 1byte = 8 bits)
Like for example, character 'A' requires 1 byte and some Japanese character might require 3 bytes.
Note: First 128 characters of ASCII are same as the first 128 characters of UTF-8. So ASCII is a subset of UTF-8
Even in PeopleSoft installation, while creating database we will be asked for the type of database which needs to be created. Since most of the clients nowadays have global presence, they would prefer Unicode database.
We also hear about UTF-16, its nothing but accommodating more space. Usually UTF-16 format is used by Operating systems etc.
Below is the code for Telugu language
Character Name Character OS X Option Code Win XP ALT Code Entity Hex Entity
TELUGU LETTER A అ Option+0C05 ALT+3077 అ అ
TELUGU LETTER AA ఆ Option+0C06 ALT+3078 ఆ ఆ
Below is the code for Indian Rupee.
Char Dec Hex Entity Name
₹ 8377 20B9 INDIAN RUPEE SIGN
UTF-8 encoding for Telugu language త ె ల ు గ ు
UTF-8 encoding for English language T E L U G U
Below is the HTML version where above codes are converted(encoded) with UTF-8 format.
UTF-8 encoding for Telugu language తెలుగు
UTF-8 encoding for English language TELUGU
No comments:
Post a Comment