c++ - std::u32string conversion to/from std::string and std::u16string -
i need convert between utf-8, utf-16 , utf-32 different api's/modules , since know have option use c++11 looking @ new string types.
it looks can use string, u16string , u32string utf-8, utf-16 , utf-32. found codecvt_utf8 , codecvt_utf16 able conversion between char or char16_t , char32_t , looks higher level wstring_convert appears work bytes/std::string , not great deal of documentation.
am meant use wstring_convert somehow utf-16 ↔ utf-32 , utf-8 ↔ utf-32 case? found examples utf-8 utf-16, not sure correct on linux wchar_t considered utf-32... or more complex codecvt things directly?
or still not in usable state , should stick own existing small routines using 8, 16 , 32bit unsigned integers?
if read documentation @ cppreference.com wstring_convert, codecvt_utf8, codecvt_utf16, , codecvt_utf8_utf16, pages include table tells can use various utf conversions.

and yes, use std::wstring_convert facilitate conversion between various utfs. despite name, not limited std::wstring, operates std::basic_string type (which std::string, std::wstring, , std::uxxstring based on).
class template std::wstring_convert performs conversions between byte string
std::string, wide stringstd::basic_string<elem>, using individual code conversion facet codecvt. std::wstring_convert assumes ownership of conversion facet, , cannot use facet managed locale. the standard facets suitable use std::wstring_convert std::codecvt_utf8 utf-8/ucs2 , utf-8/ucs4 conversions , std::codecvt_utf8_utf16 utf-8/utf-16 conversions.
for example:
typedef std::string u8string; u8string to_utf8(const std::u16string &s) { std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> conv; return conv.to_bytes(s); } u8string to_utf8(const std::u32string &s) { std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> conv; return conv.to_bytes(s); } std::u16string to_utf16(const u8string &s) { std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> conv; return conv.from_bytes(s); } std::u16string to_utf16(const std::u32string &s) { std::wstring_convert<std::codecvt_utf16<char32_t>, char32_t> conv; std::string bytes = conv.to_bytes(s); return std::u16string(reinterpret_cast<const char16_t*>(bytes.c_str()), bytes.length()/sizeof(char16_t)); } std::u32string to_utf32(const u8string &s) { std::wstring_convert<codecvt_utf8<char32_t>, char32_t> conv; return conv.from_bytes(s); } std::u32string to_utf32(const std::u16string &s) { const char16_t *pdata = s.c_str(); std::wstring_convert<std::codecvt_utf16<char32_t>, char32_t> conv; return conv.from_bytes(reinterpret_cast<const char*>(pdata), reinterpret_cast<const char*>(pdata+s.length())); }
Comments
Post a Comment