Help

The Community will be temporarily unavailable starting on Friday February 28. We’ll be back as soon as we can! To learn more, check out our Announcements blog post.

LEN returns 2 for 4-byte UTF-8 encoded characters

Topic Labels: Formulas
1999 0
cancel
Showing results for 
Search instead for 
Did you mean: 
Simon_Horup_Es1
6 - Interface Innovator
6 - Interface Innovator

Hi! For some Uncide codepoints that are encoded as 4-byte UTF-8, such as 🌸, 🎧 and 𐄁, Airtable’s LEN formula function appears to return 2 instead of the expected 1. This is not the case for 2-byte or 3-byte UTF-8 encoded characters as far as I can tell, such as or Đ, for which LEN will correctly return 1.

I’ve created a sample Airtable here that highlights the problem.

z61XdrG.png

I discovered this when using LEN("️") as an answer to this post where it returns 10 instead of the expected 5. To find characters to test with I used the UTF-8 ranges to generate some code-points with a local Ruby IRB prompt: E.g. "\u{0110}".bytesize will give you Đ as a 3-byte UTF-8 encoded character.

0 Replies 0