c++ - std::u32string conversion to/from std::string and std::u16string -
i need convert between utf-8, utf-16 , utf-32 different api's/modules , since know have option use c++11 looking @ new string types.
it looks can use string
, u16string
, u32string
utf-8, utf-16 , utf-32. found codecvt_utf8
, codecvt_utf16
able conversion between char
or char16_t
, char32_t
, looks higher level wstring_convert
appears work bytes/std::string
, not great deal of documentation.
am meant use wstring_convert
somehow utf-16 ↔ utf-32 , utf-8 ↔ utf-32 case? found examples utf-8 utf-16, not sure correct on linux wchar_t
considered utf-32... or more complex codecvt things directly?
or still not in usable state , should stick own existing small routines using 8, 16 , 32bit unsigned integers?
if read documentation @ cppreference.com wstring_convert
, codecvt_utf8
, codecvt_utf16
, , codecvt_utf8_utf16
, pages include table tells can use various utf conversions.
and yes, use std::wstring_convert
facilitate conversion between various utfs. despite name, not limited std::wstring
, operates std::basic_string
type (which std::string
, std::wstring
, , std::uxxstring
based on).
class template std::wstring_convert performs conversions between byte string
std::string
, wide stringstd::basic_string<elem>
, using individual code conversion facet codecvt. std::wstring_convert assumes ownership of conversion facet, , cannot use facet managed locale. the standard facets suitable use std::wstring_convert std::codecvt_utf8 utf-8/ucs2 , utf-8/ucs4 conversions , std::codecvt_utf8_utf16 utf-8/utf-16 conversions.
for example:
typedef std::string u8string; u8string to_utf8(const std::u16string &s) { std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> conv; return conv.to_bytes(s); } u8string to_utf8(const std::u32string &s) { std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> conv; return conv.to_bytes(s); } std::u16string to_utf16(const u8string &s) { std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> conv; return conv.from_bytes(s); } std::u16string to_utf16(const std::u32string &s) { std::wstring_convert<std::codecvt_utf16<char32_t>, char32_t> conv; std::string bytes = conv.to_bytes(s); return std::u16string(reinterpret_cast<const char16_t*>(bytes.c_str()), bytes.length()/sizeof(char16_t)); } std::u32string to_utf32(const u8string &s) { std::wstring_convert<codecvt_utf8<char32_t>, char32_t> conv; return conv.from_bytes(s); } std::u32string to_utf32(const std::u16string &s) { const char16_t *pdata = s.c_str(); std::wstring_convert<std::codecvt_utf16<char32_t>, char32_t> conv; return conv.from_bytes(reinterpret_cast<const char*>(pdata), reinterpret_cast<const char*>(pdata+s.length())); }
Comments
Post a Comment