LEN returns 2 for 4-byte UTF-8 encoded characters

Forum|Forum|9 years ago
February 6, 2017
0 replies
9 views

Simon_Horup_Es1
Known Participant

Hi! For some Uncide codepoints that are encoded as 4-byte UTF-8, such as 🌸, 🎧 and 𐄁, Airtable’s LEN formula function appears to return 2 instead of the expected 1. This is not the case for 2-byte or 3-byte UTF-8 encoded characters as far as I can tell, such as ⛄ or Đ, for which LEN will correctly return 1.

I’ve created a sample Airtable here that highlights the problem.

I discovered this when using LEN("⭐️⭐️⭐️⭐️⭐️") as an answer to this post where it returns 10 instead of the expected 5. To find characters to test with I used the UTF-8 ranges to generate some code-points with a local Ruby IRB prompt: E.g. "\u{0110}".bytesize will give you Đ as a 3-byte UTF-8 encoded character.

This topic has been closed for replies.

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded