cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Jeff_Perkinson
Community Manager Community Manager
On converting a list of associative arrays to a data table

I recently posted an add-in to log each launch of JMP.

As I was developing it, I thought I was being clever by storing the logging information as an associative array.

Here's an example of what's in a single log file.

["Bitness" => "64-bit", 
 "Build Information" => "Jan  9 2017, 00:53:08, Release, JMP    ", 
 "Date" => "06/06/2017 08:05:03 AM", 
 "Host" => "Windows", 
 "Hostname" => "computer1", 
 "JMP Version" => "13.1.0", 
 "Product Name" => "Standard", 
 "Username" => "joeuser"]

An associative array makes the log self-documenting as each piece of data is labeled. It also makes the log extensible. So, in the future you can easily add more information to the logs and any existing program will either adapt or ignore it.

It's pretty easy to read a directory full of log files into a list of associative arrays like this:

logfiles = Files In Directory( "./Logs" );
logs = {};
For( i = 1, i <= N Items( logfiles ), i++,
	Insert Into( logs, Parse( Load Text File( "./Logs/" || logfiles[i] ) ) )
);

 This results in a structure that looks like this:

logs = {
["Bitness" => "64-bit", 
 "Build Information" => "Jan  9 2017, 00:53:08, Release, JMP    ", 
 "Date" => "06/06/2017 02:53:43 PM", 
 "Host" => "Windows", 
 "Hostname" => "MFIDT0259", 
 "JMP Version" => "13.1.0", 
 "Product Name" => "Standard", 
 "Username" => "pol"], 
["Bitness" => "64-bit", 
 "Build Information" => "Jan  9 2017, 00:53:08, Release, JMP    ", 
 "Date" => "06/06/2017 08:30:03 AM", 
 "Host" => "Windows", 
 "Hostname" => "MFIDT0259", 
 "JMP Version" => "13.1.0", 
 "Product Name" => "Standard", 
 "Username" => "pol"]
};

What I didn't think about though is that it's not quite as easy to convert that list of associative arrays into a data table with columns named after the keys and a row for each item in the list. It can be done with a lot of looping over the list and the arrays, but I was hoping for something a bit more elegant.

So, I did something similar to what many of you do by posting a question in the Discussions forum. I sent email to our development team and other experts at SAS asking how they would do it.

On Jun 6, 2017, at 6:16 PM, Jeff Perkinson wrote:
list_of_associative_arrays = {
["key1" => "a", "key2" => "b"], 
["key1" =>"c", "key2" => "d"], 
["key2" => “f", "key1" => “e", "key3" => "g”], 
 };
 
How would you convert this into this data table?
 
JMPScreenSnapz102.png
 
Notice a few things:
  • The columns are named based on the keys of the arrays
  • The keys appear in different orders
  • The number of keys in each associative array varies
I was hoping to get:
  1. JSL that was relatively easy to interpret when reading it.
  2. JSL that was relatively easy to write, with the fewest statements.
  3. Efficiency(ish) without a lot of looping.
I knew it could be done with a series of loops, and @DonMcCormack  provided that solution
But some others provided interesting ideas that are worth exploring.

@brady_brady had a couple of solutions.

One was a creative hack that counted on a highly structured problem, and converted the keys and values to text strings and then to a data table that he manipulated. 
l = {
["key1" => "c", "key2" => "d", "key3" => "g2", "key4" => "h"], 
["key1" => "a", "key2" => "e", "key3" => "g3", "key4" => "o"], 
["key1" => "b", "key2" => "f", "key3" => "g1", "key4" => "i"], 
["key1" => "c", "key2" => "a", "key3" => "g6", "key4" => "k"]
};

nt = New Table( "x", new column("Column 1", character), new column("Group", character) );
n_l = nitems(l);
n_k = nitems(l[1]<<get keys);

vals = words(char(l), "{} [\!"=>,]")[2::2*n_l*n_k::2];
refs = words(char(l), "{} [\!"=>,]")[1::2*n_l*n_k-1::2];

column(nt,1)<<set values(vals);
column(nt,2)<<set values(refs);

nt << Split(
Split By( :Group ),
Split( :Column 1 ),
outputTableName("Challenge")
);

close(nt,nosave);
The other was, in his own words, unexciting and had a pair of nested loops. But, notice the use of data table scripting in the final statement. You’ll see that again in our winning solution.
l = {
["key1" => "a", "key2" => "b"], 
["key2" => "f", "key1" => "e", "key3" => "g"], 
["key1" => "c", "key2" => "d", "key4" => "h"], 
["key2" => "f2", "key1" => "e2", "key3" => "g2"]
};

nt = New Table( "challenge" );
colList = {};

For( i = 1, i <= N Items( l ), i++,
nt << addrows( 1 );
ki = l[i] << get keys;
For( j = 1, j <= N Items( ki ), j++,
If( !Contains( colList, ki[j] ),
nt << New Column( ki[j], character );
Insert Into( colList, ki[j] );
);
Column( nt, ki[j] )[i] = l[i][ki[j]];
);
);

nt << delete column( 1 );
My choice for the winning solution is a combination of solutions from @Craige_Hales and @joseph_morgan.
l = {
       ["key1" => "a", "key2" => "b"],
       ["key1" =>"c", "key2" => "d"],
       ["key2" => "f", "key1" => "e", "key3" => "g"]  
};
 
// get unique column names – alas, cannot just count items in associative arrays
colnm=l[1]; i=1;
j(nitems(l),1,colnm<<insert(l[i++]));
 
// create blank datatable
// Note: assuming column names have a fixed prefix (in this case key) – cheating perhaps??
dt=newtable("Jeff");
dt<<addmultiplecolumns("key",nitems(colnm),character);
 
// add rows but need to transform list to form that addrows() expects
for(i=1,i<=nitems(l),i++,
       tmp=l[i]<<getcontents;
       tmplt={}; k=1;
       tmplt[1::nitems(tmp)]=expr(expr(parse(tmp[k][1]))=expr(tmp[k++][2]));
       dt<<addrows(evalexpr(tmplt))
);
This is really clever in that it takes advantage of the implicit loop in the J() function. This is a trick that I continue to forget but always try to remember. However, he does have that final loop and a number of Expr() functions that make it more difficult to read.
Also, he admits to cheating in assuming that the keys are consistently named. I’ll forgive that as it's a problem that’s easily overcome with a loop over the data structure he’s created:
colnames=colnm<<get keys;
for(i=1, i<=ncol(dt), i++,
   column(dt,i)<< set name(colnames[i])
);
Now, @Craige_Hales' solution:
list = {
["key9" => "k9r1", "key2" => "k2r1"],
["key2" => "k2r2", "key1" => "k1r2", "key3" => "k3r2"],
["key1" => "k1r3", "key2" => "k2r3", "key4" => "k4r3"],
["key2" => "k2r4", "key1" => "k1r4", "key7" => "k7r4"]
};
 
nonunique = list<<getkeys;
unique=associativearray();
while( nitems(k = removefrom(nonunique)), unique<<insert(associativearray(k[1])));
 
unique = unique<<getkeys;
dt = New Table( "challenge" );
while( nitems(k=removefrom(unique,1)), dt<<newcolumn(k[1],character));
dt<<addrows(nitems(list));
 
foreachrow(
       i = row();
       dt[i,list[i]<<getkeys] = list[i]<<getvalues;
);
I like this because the final loop is over the data table and uses the data table scripting we first saw in @brady_brady's solution. You may disagree, but to me that feels better than looping over a JSL data structure. Perhaps it’s because it’s fewer characters, and that makes it easier to read. 
But, he uses a couple of clunky While() loops to get the keys and name the columns. So, I’ll take as the winner a combination of @Craige_Hales' and @joseph_morgan's solutions:
// Revised
list = {
["key9" => "k9r1", "key2" => "k2r1"],
["key2" => "k2r2", "key1" => "k1r2", "key3" => "k3r2"],
["key1" => "k1r3", "key2" => "k2r3", "key4" => "k4r3"],
["key2" => "k2r4", "key1" => "k1r4", "key7" => "k7r4"]
};
 
unique=list[1];
i=1;
j(nitems(list),1,unique<<insert(list[i++]));

dt = New Table( "challenge" );

colnames=unique<<get keys;

dt<<addmultiplecolumns("key",nitems(colnames),character);

for(i=1, i<=ncol(dt), i++,
column(dt,i)<< set name(colnames[i])
);


dt<<addrows(nitems(list));
 
foreachrow(
       i = row();
       dt[i,list[i]<<getkeys] = list[i]<<getvalues;
);
If the J() function – and its implicit loop – makes it more difficult for you to read, @joseph_morgan points out that it could be replaced with a  For() loop.
for( i=2,i<=nitems(list),i++, unique<<insert(list[i]));
This was really instructive for me to study each solution and see a number of tricks here that I may be able to use later. Take some time and do the same for yourself.
Like any language, there are lots of ways to accomplish the same thing. Each has its advantages. The trick is to know them and apply them where they are most advantageous.
Last Modified: Jun 16, 2017 1:29 PM
Comments
Phil_Brown
Super User (Alumni)

Very interesting! If I may offer yet another approach, similar to Brady's but slightly different.

 

list = {
["key9" => "k9r1", "key2" => "k2r1"], 
["key2" => "k2r2", "key1" => "k1r2", "key3" => "k3r2"], 
["key1" => "k1r3", "key2" => "k2r3", "key4" => "k4r3"], 
["key2" => "k2r4", "key1" => "k1r4", "key7" => "k7r4"]
}; 


newdt = New Table();
cols ={};
For( i = 1, i <= N Items( list ), i++,
	newdt << addRows( 1 );
	cKey = list[i] << first;
	While( !Is Empty( cKey ),
		If( !Contains(cols, cKey),
			newdt << newColumn( cKey, Character );
			InsertInto( cols, cKey);
		);
		Column( cKey )[i] = list[i][ckey];
		cKey = list[i] << next( cKey );
	);
);

newdt << deleteColumns( 1 );
newdt << reOrderByName;

 

Jeff_Perkinson
Community Manager

Nice @Phil_Brown!

 

I like the use of the <<First and <<Next() to iterate through the associative arrays.

 

Another one of those tricks I forget to remember. 

Phil_Brown
Super User (Alumni)

I also like the fact that you don't need to know how many elements there are. The While( IsEmpty(...) ) allows the array length to be dynamic.

nathan-clark
Level VI

Wow, @Jeff_Perkinson and @Phil_Brown, I came looking for some general stuff on indexing through associative arrays and I got so much more!

As a recent convert to associative arrays (so many nested lists in my past :-/ ) I'm trying to use them more often and there must be at least 4 separate ideas in this thread alone I can merge into various sets of my code!

 

Thanks for posting this :)

BioData41
Level II

I ran into a few issues with the recommended approach in the original post, related to this loop where values get inserted to table for each row:

 

foreachrow(
       i = row();
       dt[i,list[i]<<getkeys] = list[i]<<getvalues;
);

 

  1. This throws as soon as a missing value (or absence of a key) is encountered in one of the associative arrays (rows).  
  2. Only adds string (character) data to the table columns.  Any data in the source that is numeric will just be ignored, resulting in empty columns where values are numeric.  
  3. The column types are left as character and are not coerced to numeric if entirely numeric data is encountered.  

I ended up with the following, which seems to address these issues:

 

aaListToDataTable = function ({aaList}, {Default Local}, 
	unique=aaList[1];
	for (i=1, i<=nitems(aaList), i++, 
		unique<<insert(aaList[i]);
	);
	colnames=unique<<get keys;
	
	dt = New Table( "challenge" );
	dt<<addmultiplecolumns("key",nitems(colnames),character);
	for(i=1, i<=ncol(dt), i++,
		column(dt,i)<< set name(colnames[i])
	);

	dt<<addrows(nitems(aaList));
	
	for(i=1, i<=nitems(aaList), i++, 
		for(j=1, j<=nitems(colnames), j++, 
			if (! Is Missing (aaList[i][colnames[j]]), 
				column(dt, colnames[j])[i] = aaList[i][colnames[j]];
			);
		);
	);
	
	return (dt);
	
);

Didn't test this for performance on particularly large data sets.  

 

Jbritsch04
Level I

Hi there. I was reading through this and it feels more complicated than it needs to be. I was able to get the solution below working, pretty minimal amount of code and it can probably be decreased a bit further with more effort.

 

//perc_array is my associative array. it has two levels if you want to call them that.  the first is a text string, the second is numeric;

perc_array= [1 => "Sum(Qty 1kHz)", 2 => "Sum(Qty 1KHZ_LT50)", 3 => "Sum(Qty 50Hz_High)", 4 => "Sum(Qty 50Hz_Low)", 5 => "Sum(Qty DC)", 6 => "Sum(Qty ERROR)", 7 => "Sum(Qty GA1GA2)", 8 => "Sum(Qty Idd)", 9 => "Sum(Qty Idd_Low)", 10 => "Sum(Qty THD)", "Sum(Qty 1kHz)" => 0.00202316156420758, "Sum(Qty 1KHZ_LT50)" => 0.000757081600136874, "Sum(Qty 50Hz_High)" => 0.0176374349048836, "Sum(Qty 50Hz_Low)" => 0.00692494412780564, "Sum(Qty DC)" => 0.00482906851160752, "Sum(Qty ERROR)" => 0.00296416694290878, "Sum(Qty GA1GA2)" => 0.124725985649668, "Sum(Qty Idd)" => 0.00232257236652159, "Sum(Qty Idd_Low)" => 0.000327213233957462, "Sum(Qty THD)" => 0.000032079728819359];

//the denominator below is how many "columns" of keys you have or how many levels; ncheck=n items(perc_Array())/2;

//xx is the variable I use to know how many "rows" i have in the associative array; dt_sort=new table("SortFT", Add Rows( xx-1 ), New Column( "Label", Character), New Column( "%", Numeric, Format("Percent",2)); ); for(i=1, i<=ncheck, i++, dt_sort:Label[i]=perc_array[i]; dt_sort:Name("%")[i]=perc_array[dt_sort:Label[i]];
);

Jake