I have a text paragraph containing several numbers in different sentences and I would like to extract those numbers and get them stacked in a column in a new tabel.
The sentence looks like this:
The bags are weighed on a mobile scale at storage location. Each 5. bag is spear sampled. The primary sample of A: 10 kg, B: 20 kg, C: 30 kg, D: 40 kg and E: 50 kg was dried for 24 hours at 105 C. The sample was subject to repeated splitting, leaving some 800-1000 grams which was further milled to pass a 100-mesh test sieve. The crushed sample is blended in a V-blender for 20 minutes and transferred to 100 ml waterproof plastic bottle and sealed.
In addition the number of different primary samples can vary from 1 up to 20 or 30.
Very thankful if somebody could point me in the right direction with a script. Tried to write it with the help of Copilot, but did not get the result I wanted.
Thanks a heap!
Here is one way to do this:
names default to here(1);
paragraph="The bags are weighed on a mobile scale at storage location. Each 5. bag is spear sampled.
The primary sample of A: 10 kg, B: 20 kg, C: 30 kg, 40 kg and E: 50 kg was dried for 24 hours at 105 C.
The sample was subject to repeated splitting, leaving some 800-1000 grams which was further milled to
pass a 100-mesh test sieve. The crushed sample is blended in a V-blender for 20 minutes and transferred
to 100 ml waterproof plastic bottle and sealed.";
theNumbers = {};
i = 1;
While( Word( i, paragraph, " ." ) != "",
theWord = Word( i, paragraph, " ." );
If( Is Missing( Num( theWord ) ) == 0,
Insert Into( theNumbers, Num( theWord ) )
);
i++;
);
Show( theNumbers );
I am sure there is someone who will also provide a RegEx() solution.
I tried expanding on your suggestion, but couldn't get it to work. Unfortunately, my findings are confusing me, so I hoped it was possible to get some more input, please.
The below snippet will find "10 kg". So far so good.
Names Default To Here( 1 );
text = "The bags are weighed on a mobile scale at storage location. Each 5. bag is spear sampled.
The primary sample of A: 10 kg, B: 20 kg, C: 30 kg, 40 kg and E: 50 kg was dried for 24 hours at 105 C.
The sample was subject to repeated splitting, leaving some 800-1000 grams which was further milled to
pass a 100-mesh test sieve. The crushed sample is blended in a V-blender for 20 minutes and transferred
to 100 ml waterproof plastic bottle and sealed.";
// Use a regular expression to find all the numbers, including decimals
matches = Regex( text, "\d+([.,]\d+)?\s*kg" );
Show( matches );Let's try scripting this into a loop to find all weights associated with "kg" in the string. I can find all individual words, and I can combine them into pairs. Within these pairs, I thought it should be smooth sailing to find the pairs like "10 kg", "20 kg", "30 kg" and so on, but the code is stubbornly refusing to find any matches and put them into the matches={} list.
I've tried moving the regexPattern inside the If function, and I've tried wrapping "combined" as well as the "regexPattern" variable in Eval().
Names Default To Here( 1 );
// the text
text = "The bags are weighed on a mobile scale at storage location. Each 5. bag is spear sampled.
The primary sample of A: 10 kg, B: 20 kg, C: 30 kg, 40 kg and E: 50 kg was dried for 24 hours at 105 C.
The sample was subject to repeated splitting, leaving some 800-1000 grams which was further milled to
pass a 100-mesh test sieve. The crushed sample is blended in a V-blender for 20 minutes and transferred
to 100 ml waterproof plastic bottle and sealed.";
// break text into words and print each of them. works as expected
words = Words( text );
For( i = 1, i <= N Items( words ), i++,
Print( words[i] )
);
// regex pattern variable
regexPattern = "\d+([.,]\d+)?\s*kg";
// empty list to store matches
matches = {};
// loop over all words in text to create and print pairs of words (we're looking for "10 kg", "20 kg" and so on). works fine
For( i = 1, i <= N Items( words ) - 1, i++,
combined = words[i] || " " || words[i + 1];
Print( "Checking: " || combined );
);
// THIS PART DOESN'T WORK
// loop once more over all words from text to create pairs of words and look for the regexPattern
For( i = 1, i <= N Items( words ) - 1, i++,
combined = words[i] || " " || words[i + 1];
// check if the current combined string matches the pattern (e.g., "10 kg")
If( Regex Match( combined, regexPattern ),
Insert Into( matches, combined );
Print( "Match found: " || combined );
,
// else
Print( "No match for: " || combined );
);
);What am I overlooking?
I haven't gone over everything, but just from a cursory look at the script, "." is a wildcard, so you'd need to escape it.
Thanks, but that's what I wanted to demonstrate with the first snippet: It does correctly identify the first match to "a number + kg" (i.e., "10 kg") in the example. If it works once, shouldn't it work for all pairs of tokens/words?
Oh, yes, sorry. I should wake up and have my coffee before posting here. I ran into this type of thing before and put up an answer (while I was at my last job) here.
Haha, that's nothing to worry about! Always good to have friendly input :)
I don't know if what you are trying to do is to exercise RegEx, or if you are trying to actually find a solution of finding all references to kg values. If the latter is the case, the below JSL is a very efficient solution
Names Default To Here( 1 );// the text
text = "The bags are weighed on a mobile scale at storage location. Each 5. bag is spear sampled. The primary sample of A: 10 kg, B: 20 kg, C: 30 kg, 40 kg and E: 50 kg was dried for 24 hours at 105 C. The sample was subject to repeated splitting, leaving some 800-1000 grams which was further milled to pass a 100-mesh test sieve. The crushed sample is blended in a V-blender for 20 minutes and transferred to 100 ml waterproof plastic bottle and sealed.";
// break text into words and print each of them. works as expected
words = Words( text );
// regex pattern variable
regexPattern = "\d+([.,]\d+)?\s*kg";
// empty list to store matches
matches = {};
potential = loc(words,"kg");
If( length(potential) > 0,
for each( {kg}, potential,
if( isMissing(words[kg-1])==0,
insert into(matches, words[kg -1] || " kg")
)
)
);
Actually, both. Thank you for your help and patience. I've never used "loc()", so that's good learning.
Copy pasting your entire code block and appending "Print( matches)" gives {"40 kg", "50 kg"}. This is consistent with "potential = Loc( words, "kg" );" only finding "[32, 36]". Interesting.