Jump to content

UTF-8: Difference between revisions

From ICANNWiki
Jessica (talk | contribs)
Created page with "'''UTF-8''' refers to Unicode Transformation Format 8-bit is a variable-width encoding that can represent every character in the Unicode character set that was designed for ba..."
 
Jessica (talk | contribs)
No edit summary
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
'''UTF-8''' refers to Unicode Transformation Format 8-bit is a variable-width encoding that can represent every character in the Unicode character set that was designed for backward compatibility with [[ASCII]].
'''UTF-8''' refers to Unicode Transformation Format 8-bit, which is a variable-width encoding that can represent every character in the Unicode character set that was designed for backward compatibility with [[ASCII]].


==Overview==
==Overview==
UTF-8 encodes each [[Unicode]] character as a variable number of 1 to 4 octets. The number of octets depends on the integer value assigned to the character.
UTF-8 encodes each [[Unicode]] character as a variable number of 1 to 4 octets. The number of octets depends on the integer value assigned to the character.
UTF-8 is the default encoding for XML and has been the dominant character encoding on the web since 2010.<ref>[https://www.utf8.com/ About utf-8]</ref>
UTF-8 is the default encoding for [[XML]] and has been the dominant character encoding on the web since 2010.<ref>[https://www.utf8.com/ About utf-8]</ref>


[[W3C]] has offered several reasons for the popularity of UTF-8:  
[[W3C]] has offered several reasons for the popularity of UTF-8:  
# An HTML page can only be in one encoding, and UTF-8 can support many languages and accommodate many pages and forms.
# An [[HTML]] page can only be in one encoding, and UTF-8 can support many languages and accommodate many pages and forms.
# Barriers to using Unicode are very low; by January 2012, Google reported that over 60% of the Web in their sample used UTF-8.  
# Barriers to using Unicode are very low; by January 2012, [[Google]] reported that over 60% of the Web in their sample used UTF-8.  
# ASCII is a subset of UTF-8; all ASCII characters in UTF-8 use the same bytes as an ASCII encoding, helping with interoperability.  
# ASCII is a subset of UTF-8; all ASCII characters in UTF-8 use the same bytes as an ASCII encoding, helping with [[Interoperability]].  
# The HTML5 specification says "Authoring tools should default to using UTF-8 for newly-created documents."<ref>[https://www.w3.org/International/questions/qa-choosing-encodings Why choose UTF-8, W3C]</ref>
# The HTML5 specification says "Authoring tools should default to using UTF-8 for newly-created documents."<ref>[https://www.w3.org/International/questions/qa-choosing-encodings Why choose UTF-8, W3C]</ref>


==References==
==References==

Latest revision as of 17:09, 12 May 2021

UTF-8 refers to Unicode Transformation Format 8-bit, which is a variable-width encoding that can represent every character in the Unicode character set that was designed for backward compatibility with ASCII.

Overview

UTF-8 encodes each Unicode character as a variable number of 1 to 4 octets. The number of octets depends on the integer value assigned to the character. UTF-8 is the default encoding for XML and has been the dominant character encoding on the web since 2010.[1]

W3C has offered several reasons for the popularity of UTF-8:

  1. An HTML page can only be in one encoding, and UTF-8 can support many languages and accommodate many pages and forms.
  2. Barriers to using Unicode are very low; by January 2012, Google reported that over 60% of the Web in their sample used UTF-8.
  3. ASCII is a subset of UTF-8; all ASCII characters in UTF-8 use the same bytes as an ASCII encoding, helping with Interoperability.
  4. The HTML5 specification says "Authoring tools should default to using UTF-8 for newly-created documents."[2]

References