Add warnings about UTF-16 vs UTF-8 strings

This commit aims to address #1348 via a number of strategies:

* Documentation is updated to warn about UTF-16 vs UTF-8 problems
  between JS and Rust. Notably documenting that `as_string` and handling
  of arguments is lossy when there are lone surrogates.

* A `JsString::is_valid_utf16` method was added to test whether
  `as_string` is lossless or not.

The intention is that most default behavior of `wasm-bindgen` will
remain, but where necessary bindings will use `JsString` instead of
`str`/`String` and will manually check for `is_valid_utf16` as
necessary. It's also hypothesized that this is relatively rare and not
too performance critical, so an optimized intrinsic for `is_valid_utf16`
is not yet provided.

Closes #1348
This commit is contained in:
Alex Crichton
2019-04-01 11:09:57 -07:00
parent c5f18b6099
commit 44738e049a
6 changed files with 89 additions and 1 deletions

View File

@@ -541,3 +541,15 @@ fn raw() {
);
assert!(JsString::raw_0(&JsValue::null().unchecked_into()).is_err());
}
#[wasm_bindgen_test]
fn is_valid_utf16() {
assert!(JsString::from("a").is_valid_utf16());
assert!(JsString::from("").is_valid_utf16());
assert!(JsString::from("🥑").is_valid_utf16());
assert!(JsString::from("Why hello there this, 🥑, is 🥑 and is 🥑").is_valid_utf16());
assert!(JsString::from_char_code1(0x00).is_valid_utf16());
assert!(!JsString::from_char_code1(0xd800).is_valid_utf16());
assert!(!JsString::from_char_code1(0xdc00).is_valid_utf16());
}