Unicode Character "󠀁" U+E0001 Language Tag

Unicode Version 15.1

󠀁

Summary

The unicode character "󠀁" at code point U+E0001 is Language Tag. It is a character in the Tags block and is part of the Common script. The character is a format. The UTF-8 encoding of "󠀁" is 0xF3 0xA0 0x80 0x81 and the UTF-16 encoding is 0xDB40 0xDC01.

General Properties

Code Point	U+E0001
Version Added	3.1
Name	Language Tag
Block	Tags
General Category	Format
Canonical Combining Class	Not Reordered
Bidirectional Class	Boundary Neutral

Encodings

HTML Decimal Encoding	󠀁
HTML Hex Encoding	󠀁
UTF-8 Encoding	0xF3 0xA0 0x80 0x81
UTF-16 Encoding	0xDB40 0xDC01
UTF-32 Encoding	0x000E0001
C/C++/Java Escape	\udb40\udc01

Unicode Properties

NFC Quick Check	Yes
NFD Quick Check	Yes
NFKC Quick Check	Yes
NFKD Quick Check	Yes
Numeric Type	None
Numeric Value	NaN
Joining Type	Transparent
Line Break	Combining Mark
Case Ignorable	Yes
Changes When NFKC Casefolded	Yes
Script	Common
Script Extensions	Common
Indic Syllabic Category	Other
Default Ignorable Code Point	Yes
Vertical Orientation	Rotated
Grapheme Cluster Break	Control
Word Break	Format
Sentence Break	Format
Deprecated	Yes