I want to clean up a free text filed column for text analysis and am trying to drop all digits except when followed by a specific letter. In the example below, I want to drop the first numbers but keep both 120f and 119 f, as they are meaningful values.
Using the following regex:
Phrase =
"14.300.114 .temperature is not 120f but 119 f...";
Show( Phrase );
Regex( Phrase, "\d+([^\d+(f|\sf)])", "", GLOBALREPLACE );
gives me this result:
"114 .temperature is not 120f but 119 f...". The remperatures are succesfully kept, but not all digits from the first digit string are droppped.
My first iteration looked like this:
Regex( Phrase, "\d+([^\d+f])", "", GLOBALREPLACE );
which resulted in :
".temperature is not 120f but f..."
So this was successful at removing all digits, but is not keeping the "119 f" case (with the space between the number and the "f").
Can someone help me understand how to build the expression that correctly drops all digits except when followed by "f" or by " f"?
Try using negative lookahead:
Phrase = "14.300.114 .temperature is not 120f but 119 f...";
Show( Phrase );
result = Regex( Phrase, "(?!\d+\s*f)\d+", "", GLOBALREPLACE );
Show( result);
Output:
Phrase = "14.300.114 .temperature is not 120f but 119 f...";
result = ".. .temperature is not 120f but 119 f...";
Try using negative lookahead:
Phrase = "14.300.114 .temperature is not 120f but 119 f...";
Show( Phrase );
result = Regex( Phrase, "(?!\d+\s*f)\d+", "", GLOBALREPLACE );
Show( result);
Output:
Phrase = "14.300.114 .temperature is not 120f but 119 f...";
result = ".. .temperature is not 120f but 119 f...";