- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
How do I identify numbers in a paragraph and extract as column in new tabel?
I have a text paragraph containing several numbers in different sentences and I would like to extract those numbers and get them stacked in a column in a new tabel.
The sentence looks like this:
The bags are weighed on a mobile scale at storage location. Each 5. bag is spear sampled. The primary sample of A: 10 kg, B: 20 kg, C: 30 kg,
In addition the number of different primary samples can vary from 1 up to 20 or 30.
Very thankful if somebody could point me in the right direction with a script. Tried to write it with the help of Copilot, but did not get the result I wanted.
Thanks a heap!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How do I identify numbers in a paragraph and extract as column in new tabel?
Here is one way to do this:
names default to here(1);
paragraph="The bags are weighed on a mobile scale at storage location. Each 5. bag is spear sampled.
The primary sample of A: 10 kg, B: 20 kg, C: 30 kg, 40 kg and E: 50 kg was dried for 24 hours at 105 C.
The sample was subject to repeated splitting, leaving some 800-1000 grams which was further milled to
pass a 100-mesh test sieve. The crushed sample is blended in a V-blender for 20 minutes and transferred
to 100 ml waterproof plastic bottle and sealed.";
theNumbers = {};
i = 1;
While( Word( i, paragraph, " ." ) != "",
theWord = Word( i, paragraph, " ." );
If( Is Missing( Num( theWord ) ) == 0,
Insert Into( theNumbers, Num( theWord ) )
);
i++;
);Show( theNumbers );
I am sure there is someone who will also provide a RegEx() solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How do I identify numbers in a paragraph and extract as column in new tabel?
I tried expanding on your suggestion, but couldn't get it to work. Unfortunately, my findings are confusing me, so I hoped it was possible to get some more input, please.
The below snippet will find "10 kg". So far so good.
Names Default To Here( 1 );
text = "The bags are weighed on a mobile scale at storage location. Each 5. bag is spear sampled. The primary sample of A: 10 kg, B: 20 kg, C: 30 kg, 40 kg and E: 50 kg was dried for 24 hours at 105 C. The sample was subject to repeated splitting, leaving some 800-1000 grams which was further milled to pass a 100-mesh test sieve. The crushed sample is blended in a V-blender for 20 minutes and transferred to 100 ml waterproof plastic bottle and sealed.";
// Use a regular expression to find all the numbers, including decimals
matches = Regex( text, "\d+([.,]\d+)?\s*kg" );
Show( matches );
Let's try scripting this into a loop to find all weights associated with "kg" in the string. I can find all individual words, and I can combine them into pairs. Within these pairs, I thought it should be smooth sailing to find the pairs like "10 kg", "20 kg", "30 kg" and so on, but the code is stubbornly refusing to find any matches and put them into the matches={} list.
I've tried moving the regexPattern inside the If function, and I've tried wrapping "combined" as well as the "regexPattern" variable in Eval().
Names Default To Here( 1 );// the text
text = "The bags are weighed on a mobile scale at storage location. Each 5. bag is spear sampled. The primary sample of A: 10 kg, B: 20 kg, C: 30 kg, 40 kg and E: 50 kg was dried for 24 hours at 105 C. The sample was subject to repeated splitting, leaving some 800-1000 grams which was further milled to pass a 100-mesh test sieve. The crushed sample is blended in a V-blender for 20 minutes and transferred to 100 ml waterproof plastic bottle and sealed.";
// break text into words and print each of them. works as expected
words = Words( text );
For( i = 1, i <= N Items( words ), i++,
Print( words[i] )
);
// regex pattern variable
regexPattern = "\d+([.,]\d+)?\s*kg";
// empty list to store matches
matches = {};
// loop over all words in text to create and print pairs of words (we're looking for "10 kg", "20 kg" and so on). works fine
For( i = 1, i <= N Items( words ) - 1, i++,
combined = words[i] || " " || words[i + 1];
Print( "Checking: " || combined );
);
// THIS PART DOESN'T WORK
// loop once more over all words from text to create pairs of words and look for the regexPattern
For( i = 1, i <= N Items( words ) - 1, i++,
combined = words[i] || " " || words[i + 1];
// check if the current combined string matches the pattern (e.g., "10 kg")
If( Regex Match( combined, regexPattern ),
Insert Into( matches, combined );
Print( "Match found: " || combined );
,
// else
Print( "No match for: " || combined );
);
);
What am I overlooking?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How do I identify numbers in a paragraph and extract as column in new tabel?
I haven't gone over everything, but just from a cursory look at the script, "." is a wildcard, so you'd need to escape it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How do I identify numbers in a paragraph and extract as column in new tabel?
Thanks, but that's what I wanted to demonstrate with the first snippet: It does correctly identify the first match to "a number + kg" (i.e., "10 kg") in the example. If it works once, shouldn't it work for all pairs of tokens/words?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How do I identify numbers in a paragraph and extract as column in new tabel?
Oh, yes, sorry. I should wake up and have my coffee before posting here. I ran into this type of thing before and put up an answer (while I was at my last job) here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How do I identify numbers in a paragraph and extract as column in new tabel?
Haha, that's nothing to worry about! Always good to have friendly input
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How do I identify numbers in a paragraph and extract as column in new tabel?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How do I identify numbers in a paragraph and extract as column in new tabel?
I don't know if what you are trying to do is to exercise RegEx, or if you are trying to actually find a solution of finding all references to kg values. If the latter is the case, the below JSL is a very efficient solution
Names Default To Here( 1 );// the text
text = "The bags are weighed on a mobile scale at storage location. Each 5. bag is spear sampled. The primary sample of A: 10 kg, B: 20 kg, C: 30 kg, 40 kg and E: 50 kg was dried for 24 hours at 105 C. The sample was subject to repeated splitting, leaving some 800-1000 grams which was further milled to pass a 100-mesh test sieve. The crushed sample is blended in a V-blender for 20 minutes and transferred to 100 ml waterproof plastic bottle and sealed.";
// break text into words and print each of them. works as expected
words = Words( text );
// regex pattern variable
regexPattern = "\d+([.,]\d+)?\s*kg";
// empty list to store matches
matches = {};
potential = loc(words,"kg");
If( length(potential) > 0,
for each( {kg}, potential,
if( isMissing(words[kg-1])==0,
insert into(matches, words[kg -1] || " kg")
)
)
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How do I identify numbers in a paragraph and extract as column in new tabel?
Actually, both. Thank you for your help and patience. I've never used "loc()", so that's good learning.
Copy pasting your entire code block and appending "Print( matches)" gives {"40 kg", "50 kg"}. This is consistent with "potential = Loc( words, "kg" );" only finding "[32, 36]". Interesting.
- Why is your solution not finding "10 kg", "20 kg" and "30 kg"?
- Why is my solution not finding anything?