Help

How To Remove Extra Characters?

5242 11
cancel
Showing results for 
Search instead for 
Did you mean: 
M_k
11 - Venus
11 - Venus

Hi

I have text paragraphs in the Long Text field that has extra and odd characters.

Is there a way to remove those unwanted symbols and characters?

Apparently, there is one way to do this, by using the Substitute function.

Can someone help me with that, in terms of writing the formula?

Is there another way to do this?

Thank you,
Mary K

11 Replies 11

Can you post an example of the text with odd characters to be removed?

Hi @Bill.French

Here is an example of the text:

image

This is how it should look like:

Yukon’s Tombstone Territorial Park was established as the result of the Tr’ondëk Hwëch’in land claim agreement, and the park lies entirely within that First Nation’s traditional territory. The diverse landscape supports a variety of wildlife, including Dall sheep, caribou, moose, black and grizzly bears, and dozens of bird species.

Thank you,
Mary K

Please paste a real copy of the bad text here so I can test it.

@M_k,

image

That example indicates whatever is producing or importing this content has character encoding issues. While you can write formulas to overcome these instances and transform the data into something more accurate, that approach is not likely to be sustainable. You would have to create formulas for all 256 possible “errors” that could be encountered and that’s not practical.

These are neither extra nor are they odd; they are encoded to render as HTML character entities.

The better approach is to not allow such encoding issues invade the database in the first place. Tell me how these instances have occurred?

Hi @Bill.French

This is an example of text from a website and by using regex in Integromat, it is going to populate a Long Text field in Airtable.

I guess the other way would be to manually make the corrections, since it is not all the words.

Thank you,
Mary K

Yep, I figured it might be something like this.

What’s happening is Integromat is reading the content and perhaps escaping HTML tags, but leaving HTML entities. You need to apply a function call like escapeHTML() at the capture point in the Integromat process. This assumes, of course, that Integromat is able to transform HTML entities into their actual character values. According to this, the only reference I could find about HTML entities with Integromat, it appears this may not be possible unless you want everything converted to plain text.

I was able to write a script block that performs these types of transformations, but that approach is like a wet bandaid - it will not be complete and will forever be subject to failure.

image

Hi @Bill.French

Thank you for your help.

So, if Integromat coverts the text to plain text and I will be placing the paragraph, as text in Airtable, would there be a difference? Since it is just text that I will need in Airtable.

Would there be any upcoming issues or issues in general with the above option?

Another idea:
The text will first be in GS spreadsheet, can GS remove the HTML entities?

Or is there any other way in GS, that can remove the unwanted characters?

Just brainstorming possible solutions.

Thank you,
Mary

No, but this assumes Integromat’s conversion to plain text transforms something like &rquos; to '. If it doesn’t, you are still in the weeds. Also, there may be some formatting you want to preserve like links. Plain text transformation - if it works in Integromat - will probably sanitize the content including links, special punctuation, and perhaps paragraph formatting, and bullet-lists. It might be ugly.

Many messages ago I asked - How does this content get created? It appears this is still not revealed but this is important because there’s a point where the encoding issues initially arise. It’s very difficult to advise you further without knowing more about the lifecycle of this content.

Given that this content is entering “GS spreadsheet” which I assume to mean Google Sheets, it got there through some process that is likely the source of the encoding problems. If true, the remedy should ideally be at the point of creation, not downstream in Integromat.

If users are manually copying and pasting web content into a Google sheet, they are likely pasting into spreadsheet cells “with formatting”. This can [sometimes] cause the encoding issue you are seeing.

Google Apps Script has the ability to remedy character encoding challenges, but it requires javascript development. Rather than leaping to this conclusion or any other implementation approach, you should fully map the content lifecycle to determine how best to eliminate this issue. Or, you could just roll the dice and hope all is corrected with plain text in Integromat.

Hi @Bill.French

The source of this issue is being looked into, I just thought that perhaps something could have been done in Airtable or Google Sheets.

It seems the resolution is a bit complex. If all else fails, I may have to manually make the text changes.

I really appreciate your help with this issue.

Thank you,
Mary