mirror of
https://github.com/fluencelabs/wasm-bindgen
synced 2025-07-31 20:11:55 +00:00
Merge pull request #1416 from alexcrichton/js-string-valid-utf16
Add warnings about UTF-16 vs UTF-8 strings
This commit is contained in:
@@ -20,3 +20,30 @@ with handles to JavaScript string values, use the `js_sys::JsString` type.
|
||||
```js
|
||||
{{#include ../../../../examples/guide-supported-types-examples/str.js}}
|
||||
```
|
||||
|
||||
## UTF-16 vs UTF-8
|
||||
|
||||
Strings in JavaScript are encoded as UTF-16, but with one major exception: they
|
||||
can contain unpaired surrogates. For some Unicode characters UTF-16 uses two
|
||||
16-byte values. These are called "surrogate pairs" because they always come in
|
||||
pairs. In JavaScript, it is possible for these surrogate pairs to be missing the
|
||||
other half, creating an "unpaired surrogate".
|
||||
|
||||
When passing a string from JavaScript to Rust, it uses the `TextEncoder` API to
|
||||
convert from UTF-16 to UTF-8. This is normally perfectly fine... unless there
|
||||
are unpaired surrogates. In that case it will replace the unpaired surrogates
|
||||
with U+FFFD (<28>, the replacement character). That means the string in Rust is
|
||||
now different from the string in JavaScript!
|
||||
|
||||
If you want to guarantee that the Rust string is the same as the JavaScript
|
||||
string, you should instead use `js_sys::JsString` (which keeps the string in
|
||||
JavaScript and doesn't copy it into Rust).
|
||||
|
||||
If you want to access the raw value of a JS string, you can use `JsString::iter`,
|
||||
which returns an `Iterator<Item = u16>`. This perfectly preserves everything
|
||||
(including unpaired surrogates), but it does not do any encoding (so you
|
||||
have to do that yourself!).
|
||||
|
||||
If you simply want to ignore strings which contain unpaired surrogates, you can
|
||||
use `JsString::is_valid_utf16` to test whether the string contains unpaired
|
||||
surrogates or not.
|
||||
|
@@ -8,6 +8,9 @@ Copies the string's contents back and forth between the JavaScript
|
||||
garbage-collected heap and the Wasm linear memory with `TextDecoder` and
|
||||
`TextEncoder`
|
||||
|
||||
> **Note**: Be sure to check out the [documentation for `str`](str.html) to
|
||||
> learn about some caveats when working with strings between JS and Rust.
|
||||
|
||||
## Example Rust Usage
|
||||
|
||||
```rust
|
||||
|
Reference in New Issue
Block a user