data:image/s3,"s3://crabby-images/1793d/1793d6e1710f557e87befc0a06c5a9cb7eae141b" alt="Simon_Horup_Es1 Simon_Horup_Es1"
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Feb 06, 2017 05:05 AM
Hi! For some Uncide codepoints that are encoded as 4-byte UTF-8, such as 🌸
, 🎧
and 𐄁
, Airtable’s LEN
formula function appears to return 2 instead of the expected 1. This is not the case for 2-byte or 3-byte UTF-8 encoded characters as far as I can tell, such as ⛄
or Đ
, for which LEN
will correctly return 1
.
I’ve created a sample Airtable here that highlights the problem.
I discovered this when using LEN("⭐️⭐️⭐️⭐️⭐️")
as an answer to this post where it returns 10 instead of the expected 5. To find characters to test with I used the UTF-8 ranges to generate some code-points with a local Ruby IRB prompt: E.g. "\u{0110}".bytesize
will give you Đ
as a 3-byte UTF-8 encoded character.
data:image/s3,"s3://crabby-images/79abb/79abbc03cc624ea7bc441501b499dd398789db84" alt=""