cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar

How do I identify numbers in a paragraph and extract as column in new tabel?

I have a text paragraph containing several numbers in different sentences and I would like to extract those numbers and get them stacked in a column in a new tabel.

The sentence looks like this: 

 

The bags are weighed on a mobile scale at storage location. Each 5. bag is spear sampled. The primary sample of A: 10 kg, B: 20 kg, C: 30 kg, 40 kg and E: 50 kg was dried for 24 hours at 105 C. The sample was subject to repeated splitting, leaving some 800-1000 grams which was further milled to pass a 100-mesh test sieve. The crushed sample is blended in a V-blender for 20 minutes and transferred to 100 ml waterproof plastic bottle and sealed.

 

In addition the number of different primary samples can vary from 1 up to 20 or 30.

Very thankful if somebody could point me in the right direction with a script. Tried to write it with the help of Copilot, but did not get the result I wanted.

Thanks a heap!

 

15 REPLIES 15
mmarchandFSLR
Level IV

Re: How do I identify numbers in a paragraph and extract as column in new tabel?

It's the commas.  "kg," throws it off.

txnelson
Super User

Re: How do I identify numbers in a paragraph and extract as column in new tabel?

The Words() function is only using spaces as the delimiters, therefore, kg words with commas( kg, ) or periods( kg. ) are leaving those characters in the extracted word.  This can easily be changed by adding the comma and period as delimiters to check for along with the space

Change 

words = Words( text );

to

words = Words( text," ,." );
Jim
jthi
Super User

Re: How do I identify numbers in a paragraph and extract as column in new tabel?

You can use one while loop for something like this (this will modify the original string so you might want to use copy of it)

Names Default To Here(1);

text = "The bags are weighed on a mobile scale at storage location. Each 5. bag is spear sampled. The primary sample of A: 10 kg, B: 20 kg, C: 30 kg, 40 kg and E: 50 kg was dried for 24 hours at 105 C. The sample was subject to repeated splitting, leaving some 800-1000 grams which was further milled to pass a 100-mesh test sieve. The crushed sample is blended in a V-blender for 20 minutes and transferred to 100 ml waterproof plastic bottle and sealed.";

res = {};
While(!Is Missing(match = Regex(text, "\d+.kg")),
	Insert Into(res, match);
	text = Substr(text, Contains(text, match) + Length(match));
);

show(res);

Or Pat Match if you can figure out how it works (I have never really been able to do that properly)

Names Default To Here(1);

text = "The bags are weighed on a mobile scale at storage location. Each 5. bag is spear sampled. The primary sample of A: 10 kg, B: 20 kg, C: 30 kg, 40 kg and E: 50 kg was dried for 24 hours at 105 C. The sample was subject to repeated splitting, leaving some 800-1000 grams which was further milled to pass a 100-mesh test sieve. The crushed sample is blended in a V-blender for 20 minutes and transferred to 100 ml waterproof plastic bottle and sealed.";

found_list = {};
Pat Match(
	text,
	Pat Any("., ") + Pat Regex("(\d+.kg)") >> match + Pat Any("., ") + Pat Test(
		Insert Into(found_list, match);
		0;
	)
);

Show(found_list);
-Jarmo
jthi
Super User

Re: How do I identify numbers in a paragraph and extract as column in new tabel?

You can use Words for this but you have to have a list of non-numeric characters. You can then use Concat Items() to turn the list you get back into a string

 

Names Default To Here(1);

str = "The bags are weighed on a mobile scale at storage location. Each 5. bag is spear sampled. The primary sample of A: 10 kg, B: 20 kg, C: 30 kg, 40 kg and E: 50 kg was dried for 24 hours at 105 C. The sample was subject to repeated splitting, leaving some 800-1000 grams which was further milled to pass a 100-mesh test sieve. The crushed sample is blended in a V-blender for 20 minutes and transferred to 100 ml waterproof plastic bottle and sealed.";

matchchar = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" || Get Punctuation Characters() || Get Whitespace Characters();
nums = Words(str,  matchchar);
// {"5", "10", "20", "30", "40", "50", "24", "105", "800", "1000", "100", "20", "100"}

Concat Items(nums, ", "); // "5, 10, 20, 30, 40, 50, 24, 105, 800, 1000, 100, 20, 100"

 

If more complicated matching is required, Regex() or Pat Match() are options but in my opinion it isn't really worth it to learn Pat Match() with it's current documentation and examples found from documentation. Add flag to Regex Match() to find all non-overlapping occurances of pattern would make tasks like this much easier to complete.

-Jarmo

Re: How do I identify numbers in a paragraph and extract as column in new tabel?

Thanks for all input so far - testing it now

Ressel
Level VI

Re: How do I identify numbers in a paragraph and extract as column in new tabel?

@CalibrationBear, have you found a solution and will you share it? Thanks