On converting a list of associative arrays to a data table

Jeff_Perkinson · Jun 16, 2017 01:29 PM

I recently posted an add-in to log each launch of JMP.

As I was developing it, I thought I was being clever by storing the logging information as an associative array.

Here's an example of what's in a single log file.

["Bitness" => "64-bit", 
 "Build Information" => "Jan  9 2017, 00:53:08, Release, JMP    ", 
 "Date" => "06/06/2017 08:05:03 AM", 
 "Host" => "Windows", 
 "Hostname" => "computer1", 
 "JMP Version" => "13.1.0", 
 "Product Name" => "Standard", 
 "Username" => "joeuser"]

An associative array makes the log self-documenting as each piece of data is labeled. It also makes the log extensible. So, in the future you can easily add more information to the logs and any existing program will either adapt or ignore it.

It's pretty easy to read a directory full of log files into a list of associative arrays like this:

logfiles = Files In Directory( "./Logs" );
logs = {};
For( i = 1, i <= N Items( logfiles ), i++,
	Insert Into( logs, Parse( Load Text File( "./Logs/" || logfiles[i] ) ) )
);

This results in a structure that looks like this:

logs = {
["Bitness" => "64-bit", 
 "Build Information" => "Jan  9 2017, 00:53:08, Release, JMP    ", 
 "Date" => "06/06/2017 02:53:43 PM", 
 "Host" => "Windows", 
 "Hostname" => "MFIDT0259", 
 "JMP Version" => "13.1.0", 
 "Product Name" => "Standard", 
 "Username" => "pol"], 
["Bitness" => "64-bit", 
 "Build Information" => "Jan  9 2017, 00:53:08, Release, JMP    ", 
 "Date" => "06/06/2017 08:30:03 AM", 
 "Host" => "Windows", 
 "Hostname" => "MFIDT0259", 
 "JMP Version" => "13.1.0", 
 "Product Name" => "Standard", 
 "Username" => "pol"]
};

What I didn't think about though is that it's not quite as easy to convert that list of associative arrays into a data table with columns named after the keys and a row for each item in the list. It can be done with a lot of looping over the list and the arrays, but I was hoping for something a bit more elegant.

So, I did something similar to what many of you do by posting a question in the Discussions forum. I sent email to our development team and other experts at SAS asking how they would do it.

On Jun 6, 2017, at 6:16 PM, Jeff Perkinson wrote:

list_of_associative_arrays = {

["key1" => "a", "key2" => "b"],

["key1" =>"c", "key2" => "d"],

["key2" => “f", "key1" => “e", "key3" => "g”],

};

How would you convert this into this data table?

Notice a few things:

The columns are named based on the keys of the arrays

The keys appear in different orders

The number of keys in each associative array varies

I was hoping to get:

JSL that was relatively easy to interpret when reading it.
JSL that was relatively easy to write, with the fewest statements.
Efficiency(ish) without a lot of looping.

I knew it could be done with a series of loops, and @DonMcCormack provided that solution.

But some others provided interesting ideas that are worth exploring.

@brady_brady had a couple of solutions.

One was a creative hack that counted on a highly structured problem, and converted the keys and values to text strings and then to a data table that he manipulated.

l = {
["key1" => "c", "key2" => "d", "key3" => "g2", "key4" => "h"], 
["key1" => "a", "key2" => "e", "key3" => "g3", "key4" => "o"], 
["key1" => "b", "key2" => "f", "key3" => "g1", "key4" => "i"], 
["key1" => "c", "key2" => "a", "key3" => "g6", "key4" => "k"]
};

nt = New Table( "x", new column("Column 1", character), new column("Group", character) );
n_l = nitems(l);
n_k = nitems(l[1]<<get keys);

vals = words(char(l), "{} [\!"=>,]")[2::2*n_l*n_k::2];
refs = words(char(l), "{} [\!"=>,]")[1::2*n_l*n_k-1::2];

column(nt,1)<<set values(vals);
column(nt,2)<<set values(refs);

nt << Split(
Split By( :Group ),
Split( :Column 1 ),
outputTableName("Challenge")
);

close(nt,nosave);

The other was, in his own words, unexciting and had a pair of nested loops. But, notice the use of data table scripting in the final statement. You’ll see that again in our winning solution.

l = {
["key1" => "a", "key2" => "b"], 
["key2" => "f", "key1" => "e", "key3" => "g"], 
["key1" => "c", "key2" => "d", "key4" => "h"], 
["key2" => "f2", "key1" => "e2", "key3" => "g2"]
};

nt = New Table( "challenge" );
colList = {};

For( i = 1, i <= N Items( l ), i++,
nt << addrows( 1 );
ki = l[i] << get keys;
For( j = 1, j <= N Items( ki ), j++,
If( !Contains( colList, ki[j] ),
nt << New Column( ki[j], character );
Insert Into( colList, ki[j] );
);
Column( nt, ki[j] )[i] = l[i][ki[j]];
);
);

nt << delete column( 1 );

My choice for the winning solution is a combination of solutions from @Craige_Hales and @joseph_morgan.

First @joseph_morgan's:

l = {
       ["key1" => "a", "key2" => "b"],
       ["key1" =>"c", "key2" => "d"],
       ["key2" => "f", "key1" => "e", "key3" => "g"]  
};
 
// get unique column names – alas, cannot just count items in associative arrays
colnm=l[1]; i=1;
j(nitems(l),1,colnm<<insert(l[i++]));
 
// create blank datatable
// Note: assuming column names have a fixed prefix (in this case key) – cheating perhaps??
dt=newtable("Jeff");
dt<<addmultiplecolumns("key",nitems(colnm),character);
 
// add rows but need to transform list to form that addrows() expects
for(i=1,i<=nitems(l),i++,
       tmp=l[i]<<getcontents;
       tmplt={}; k=1;
       tmplt[1::nitems(tmp)]=expr(expr(parse(tmp[k][1]))=expr(tmp[k++][2]));
       dt<<addrows(evalexpr(tmplt))
);

This is really clever in that it takes advantage of the implicit loop in the J() function. This is a trick that I continue to forget but always try to remember. However, he does have that final loop and a number of Expr() functions that make it more difficult to read.

Also, he admits to cheating in assuming that the keys are consistently named. I’ll forgive that as it's a problem that’s easily overcome with a loop over the data structure he’s created:

colnames=colnm<<get keys;
for(i=1, i<=ncol(dt), i++,
   column(dt,i)<< set name(colnames[i])
);

Now, @Craige_Hales' solution:

list = {
["key9" => "k9r1", "key2" => "k2r1"],
["key2" => "k2r2", "key1" => "k1r2", "key3" => "k3r2"],
["key1" => "k1r3", "key2" => "k2r3", "key4" => "k4r3"],
["key2" => "k2r4", "key1" => "k1r4", "key7" => "k7r4"]
};
 
nonunique = list<<getkeys;
unique=associativearray();
while( nitems(k = removefrom(nonunique)), unique<<insert(associativearray(k[1])));
 
unique = unique<<getkeys;
dt = New Table( "challenge" );
while( nitems(k=removefrom(unique,1)), dt<<newcolumn(k[1],character));
dt<<addrows(nitems(list));
 
foreachrow(
       i = row();
       dt[i,list[i]<<getkeys] = list[i]<<getvalues;
);

I like this because the final loop is over the data table and uses the data table scripting we first saw in @brady_brady's solution. You may disagree, but to me that feels better than looping over a JSL data structure. Perhaps it’s because it’s fewer characters, and that makes it easier to read.

But, he uses a couple of clunky While() loops to get the keys and name the columns. So, I’ll take as the winner a combination of @Craige_Hales' and @joseph_morgan's solutions:

// Revised
list = {
["key9" => "k9r1", "key2" => "k2r1"],
["key2" => "k2r2", "key1" => "k1r2", "key3" => "k3r2"],
["key1" => "k1r3", "key2" => "k2r3", "key4" => "k4r3"],
["key2" => "k2r4", "key1" => "k1r4", "key7" => "k7r4"]
};
 
unique=list[1];
i=1;
j(nitems(list),1,unique<<insert(list[i++]));

dt = New Table( "challenge" );

colnames=unique<<get keys;

dt<<addmultiplecolumns("key",nitems(colnames),character);

for(i=1, i<=ncol(dt), i++,
column(dt,i)<< set name(colnames[i])
);


dt<<addrows(nitems(list));
 
foreachrow(
       i = row();
       dt[i,list[i]<<getkeys] = list[i]<<getvalues;
);

If the J() function – and its implicit loop – makes it more difficult for you to read, @joseph_morgan points out that it could be replaced with a For() loop.

for( i=2,i<=nitems(list),i++, unique<<insert(list[i]));

This was really instructive for me to study each solution and see a number of tricks here that I may be able to use later. Take some time and do the same for yourself.

Like any language, there are lots of ways to accomplish the same thing. Each has its advantages. The trick is to know them and apply them where they are most advantageous.