Moving a subset of data from one table to a second table's columns with a transp...

mostarr · May 6, 2021 03:50 PM

I'm sorry for posting this again. I've posting bits and pieces of my workflow, masked, here, for help, and I'm still struggling. I have difficulty checking responses as I have had bad experiences on StackOverflow being yelled at and such for no real reason. I'm sure the scars will heal with time. Anyway, I'm around and I will get to your responses eventually.

To get to the technical point:

I'm trying to copy the dates some samples were run (each sample should have been run on 3 separate occasions assuming everything went well during the lab portion of the project, which I was not around for) into three separate columns of a manifest file which keeps track of ongoing metadata for our project, from what I have gleaned, such that each column is one instance of data collection. There are multiple donors each with a donor ID, and some redundancy in the data like multiple collection channels on the instrument for the "same data point" (depends on how you look at it), though I think that should be it.

Here is my pseudocode I have been trying to work off of and it has been rough so far. I don't know why. I'm not a seasoned computer scientist, developer, or software engineer. Any help is appreciated.

DataTypes:
	(associative array) DonorID{
		(key) Month1: (value) date1 (empty at start)
		(key) Month2: (value) date2 (empty at start)
		(key) Month3: (value) date3 (empty at start)
	}



find unique donor IDs in data
for each donor ID{
	get (i.e. subset) data keyed on donor ID;
	for Month in 1, 2, 3 (strings){
		if no date in donorID associative array{
			add date from data;
		}else if date already{
			is the date in the data table cell the same as in the associative array entry?
				if yes, do nothing (indicates different target channel)
				if no, ping user
					Multiple run dates for same data point is a problem.
		}
	}
}
Copy from data to manifest:
	loop over DonorID associative array:
		for each donorID:
			in Month 1, 2, or 3:
				add date1 2 or 3 to "Actual Month [1, 2, or 3] Stability Test Date" in manifest
				donorID is still row identifier
				date (1, 2, or 3) is now column identifier

brady_brady · May 6, 2021 05:16 PM

Hi,

Welcome to the community! I'm sorry you've had negative experiences on other boards when asking for help. While the JMP User Community is a welcoming and helpful place, there are some "best practices" that should maximize your chances of obtaining the information you seek.

Asking specific, vice more general, questions, is the way to go if you can pull it off--though I do realize that sometimes one "doesn't know what (s)he doesn't know". Posting a considerable chunk of pseudocode can frighten people away... if you can ask your question in no other way, it will help greatly to attach a "before" and "after" file--and screenshots--so viewers will have concrete examples of a "start" table (or text data) and a "finish" table. This would definitely make a real difference in this particular case. If you have to "sanitize" the datasets, that is fine (best, really). As long as the point you're trying to illustrate can still be made, it's all good.

If you ask specific, well-targeted questions and include illustrative files and screenshots, chances are that many on here will jump in to help. I can think of more than a few offhand who can't really resist that!

Cheers,

Brady

mostarr · May 11, 2021 06:58 PM

Thank you, Brady.

mostarr · May 11, 2021 07:11 PM

Hi everyone. Here is my code so far:

Names Default To Here( 1 );
dt_data = Open( "[filepath, no issues here]" );
dt_manifest = Open( "[filepath, no issues here]" );
Summarize( dt_data, IDs = by( :ID ) ); //this finds all the unique IDs in the data file, which we will need //[no issues here]
//First, populate the temporary data structure--associative array--from the data file which we will then copy over to the manifest file.
For( i = 1, i <= N Items( IDs ), i++,
	ID = IDs[i];
	data_array = Associative Array( {"data1", "data2", "data3"}, {{}, {}, {}} ); //it appears in the print statements we get 3 copies, indicating this loop works
	For( data = data_array << First, !Is Empty( data ), data = data_array << Next( data ), //looping through data_array's keys is clunky but done this way in JMP			
		Write( ID ); //just a check; works
		data_array[data] = :Run date << Where(:ID == ID & :Timepoint == data); //does not work
		//...continue with further code to transfer from LBCM_array to manifest
	);
);

I've scrubbed it so it's confidential (no worries spying on corporate information ;)) while retaining structure and information (name 'a' name, label to label).

My problem is with this line at the moment:

data_array[data] = :Run date << Where(:ID == ID & :Timepoint == data);

It throws the error message

.
Name Unresolved: Run date in access or evaluation of 'Run date' , :Run date/*###*/

In the following script, error marked by /*###*/
...
data_array[data] = :Run date/*###*/ <<Where( :ID == ID & :Timepoint == data ...

:Run date is a data column (header) in my data table (specifically, dt_data) with experiment run dates. I'm trying to transfer these to the second file, as each data point should have been collected thrice/three times and that is why data_array has three keys and three empty values when it is initialized. This is where the data goes, this associative array (data_array). It is then transferred to dt_manifest but that is for later, once this part is working.

For now it seems I am having trouble accessing, in the line

data_array[data] = :Run date << Where(:ID == ID & :Timepoint == data);

the appropriate selection of rows and columns I am interested in. Is the problem statement clear enough at this point in time for anyone in the audience to pitch in and offer a solution or contribution as to how to specify in a more syntactically (and possibly semantically) correct way how to select the subset of data I am interested in (from data_dt)?

Don't worry about the transpose issue. I'll deal with that later, and I think honestly it should be quite easy once the data is in an aa (associative array). At this point I'm still learning how to manipulate row/column selections, subsets of data tables, etc. related issues.

One more note: If it seems odd that dates and data and arrays are being mixed up, it's most probably because I chose poor names for the scrubbing/anonymization titles/labels of the data columns or relevant data entities to keep my company's secrets safe. I'm still a rookie at this whole process I'd say, and I prefer to err on the safe side.

I will continue to work on the problem from my end.

Many thanks,

Mike

txnelson · May 11, 2021 10:25 PM

Mike,

I remain lost a bit on why you want to use an Associative Array, but regardless, I believe the below code will move into the array, the date values for the 3 measurement rows for each ID. I made up some data that I hope resembles the layout of your data. I am only guessing based upon your statements and code usage.

Here is the script. The above data table is generated at the beginning of the script, so the guts of your code, with my modifications comes below the data table generation.

Names Default To Here( 1 );
dt_data = New Table( "Example",
	Add Rows( 6 ),
	New Column( "ID", Character, "Nominal", Set Values( {"A", "A", "A", "B", "B", "B"} ) ),
	New Column( "Run Date",
		Numeric,
		"Continuous",
		Format( "m/d/y", 12 ),
		Input Format( "m/d/y" ),
		Set Values( [3692304000, 3694982400, 3697401600, 3700080000, 3702672000, 3705350400] )
	),
	New Column( "TimePoint", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [., ., ., ., ., .] ) )
);

//dt_manifest = Open( "[filepath, no issues here]" );

Summarize( dt_data, IDs = by( :ID ) ); //this finds all the unique IDs in the data file, which we will need //[no issues here]
//First, populate the temporary data structure--associative array--from the data file which we will then copy over to the manifest file.

For( i = 1, i <= N Items( IDs ), i++,
	ID = IDs[i];
	theRows = dt_data << get rows where( :ID == ID );
	data_array = Associative Array( {"data1", "data2", "data3"}, {{}, {}, {}} );
	
	k=0;
	
	For( data = data_array << First, !Is Empty( data ), data = data_array << Next( data ), //looping through data_array's keys is clunky but done this way in JMP
		k++;
		show( ID ); //just a check; works
		data_array[data] = :Run date[theRows[k]];
		//...continue with further code to transfer from LBCM_array to manifest
	);
	show(data_array);
);

The results of the code is listed in the log

ID = "A";
ID = "A";
ID = "A";
data_array = ["data1" => 3692304000, "data2" => 3694982400, "data3" => 3697401600];
ID = "B";
ID = "B";
ID = "B";
data_array = ["data1" => 3700080000, "data2" => 3702672000, "data3" => 3705350400];

The numbers in the data_array are the numerical values for the Run Dates.

Jim

mostarr · May 14, 2021 06:31 PM

Hi Jim. Thank you for the response. Would you mind explaining in human English how your code works? My interpretation is: All rows where the ID is unique are placed in theRows. Then, an index k is instantiated. It is incremented inside the loop where the run dates are added to the data array. Inside this loop, now, instead of what I had before, each unique value in theRows is used as an index to access :Run date and that result is populated into data array.

This would make sense to me if only I studied it longer. It's not too clear at first glance. Not sure. But you do explain at the end of your post with a one-line conclusion.

After I replace all the scrubbed variables with my real variables in my code, I get the error:

invalid subscript (must be number or list of numbers) in access or evaluation of 'theRows[k]' , theRows[/*###*/k]

and I don't know why.

I'm still confused and I'll continue to study this bit of code. It's odd, I've taken a fair amount of coding classes in college, but I've not maintained my knowledge base so I suppose it could have eroded over time.

Thank you for your help!

Michael

PS: As to why associative arrays: Thought it would be an efficient way to transfer data as a JMP beginner with experience more along the lines of Java/Python. Would you recommend directly manipulating the tables? I'm so far along this route, though, that I wonder if it would be just flat out interesting to complete the task at hand in this manner.

brady_brady · May 12, 2021 11:34 AM

Hi,

You write:

My problem is with this line at the moment:

data_array[data] = :Run date << Where(:ID == ID & :Timepoint == data);

I see a couple of potential issues... be sure to look into these:

1) :Run Date, as written, refers to a column in whichever data table is current at the time this line of code is interpreted. It seems from the error message that at the time of interpretation of this line of code, the current data table does not contain a column called "Run Date". I see that you open two tables; always make sure that the intended table is indeed the current table, if leaving it up to JMP which table to use. You can force any table you want to be the current table, using the Current Data Table command:

dt1 = open("firstfile");
dt2 = open("secondfile");

Current Data Table (dt1); // sets first table current
Current Data Table (dt2); //sets second table current

You need not leave it up to JMP to decide, however. You can explicitly specify the table to be used, current or not, using a table:column format, as in:

dt1:Run Date

2) Once you fix the table currency issue, you'll need to use the << get rows where ( ) message, sent to the appropriate data table, to return rows meeting some criteria. Often these are used as indices, as in your case. Assuming that you want to work with dt1, the code then would appear as:

data_array[data] = dt1:Run Date[dt1 << Get rows Where(:ID == ID & :Timepoint == data)];

This line does 3 jobs:

1) Gets the rows in dt1 meeting some criteria, returning them in a vector. Note that because the << get rows where() message is sent to dt1, it is understood that :ID and :Timepoint are also in dt1.

2) Uses the vector returned in step 1 above as indices to dt1:Run Date, which is the Run Date column in dt1. Because we've explicitly scoped Run Date to the dt1 table using dt1:Run Date, it does not matter whether dt1 happens to be the current table, or not. Completion of this 2nd step returns a vector of values--exactly the ones we want--from the Run Date column.

3) Assigns the vector of Run Date values returned in step 2 above as the value of a specific key, data, in the associative array, data_array.

Cheers,

Brady

mostarr · May 14, 2021 06:54 PM

Thank you, Brady. That all makes sense to me.

It looks like a few of those things were in fact issues in my code. I've addressed them and am heading along. I have new issues that I will post once I figure them out a bit.

Mike

mostarr · May 14, 2021 07:08 PM

Hi all. I'm making progress, actually, for once, instead of being stuck on something without knowing which direction to go.

At the moment one of my key/value pairs does not fit, so I am getting incorrect results from one of my array calls. I am trying to debug this so I can subset/filter the data table correctly down to what I want for transfer to the associative array.

Many thanks for all your help.

Mike

mostarr · May 14, 2021 08:00 PM

The below works for me...

Names Default To Here( 1 );
dt_data = Open("...");
Summarize( dt_data, IDs = by(dt_data:ID)); //this finds all the unique donor IDs in the data file, which we will need
//First, populate the temporary data structure--associative array--from the data file which we will then copy over to the manifest data file.
For(i = 1, i <= N Items(IDs), i++,
    ID = num(IDs[i]); //the type conversion here is important, or else a key-value call in the associative array below doesn't work
	data_array = Associative Array( {"data1", "data2", "data3"}, {{}, {}, {}} ); //it appears in the print statements we get 3 copies, indicating this loop works
	For( data = data_array << First, !Is Empty( data ), data = data_array << Next( data ), //looping through data_array's keys is clunky but done this way in JMP
	    data_array[data] = dt_data:Run date[dt_data << Get rows Where(:ID == ID & :Timepoint == data)]
	);
);

except now I have multiple copies of each data (date time point) entry in the data_array. I wonder how to scrub that clean, make only one copy of each date/datum in data_array?

I have been trying an if statement, using an As Date() statement to compare and format the dates for comparison and entry to make sure that we only get one unique value. But that has given me problems so far (i.e. the array turns into either error or empty again).

I am really happy with the progress thus far, though.

@txnelson: Sorry, I like your method and thank you for your help, but you'll notice I was able to fandangle it without your indexing "k" technique =). (I wonder how your method fairs when there are multiple dates of the same value, which is what I have in my case?)

@brady_brady: Notice I have implemented data table identifiers before the column calls where appropriate. Thank you for teaching me this basic concept; I was dying of curiosity about it and Googled to no end with no result until now.

Mike

Moving a subset of data from one table to a second table's columns with a transpose

Re: Moving a subset of data from one table to a second table's columns with a transpose

Re: Moving a subset of data from one table to a second table's columns with a transpose

Re: Moving a subset of data from one table to a second table's columns with a transpose

Re: Moving a subset of data from one table to a second table's columns with a transpose

Re: Moving a subset of data from one table to a second table's columns with a transpose

Re: Moving a subset of data from one table to a second table's columns with a transpose

Re: Moving a subset of data from one table to a second table's columns with a transpose

Re: Moving a subset of data from one table to a second table's columns with a transpose

Re: Moving a subset of data from one table to a second table's columns with a transpose