Subscribe Bookmark RSS Feed

Appending non-duplicate (single column ID) rows to a main table without creating a new table?

cdwhite

Community Trekker

Joined:

May 3, 2015

Scenario: need to append daily survey data files (that include all previous days) to a main table with only non-duplicative respondent ID records and without creating a new table (because I do not want to lose selected rows in the main table that are hidden or exluded

Two attempts:

Concatenate mainains the main table but brings over all records

Join brings only the new records but creates a new table so row characteristics (hide, exlcude) are lost

1 ACCEPTED SOLUTION

Accepted Solutions
cdwhite

Community Trekker

Joined:

May 3, 2015

Solution

Thank you.  Added new column designated as row state data type to original data table and then used the copy to and from features in the row states menu (by clicking on row state column in column list box on left, not actual column header) to copy over and then re-apply the row states after joining.  The script for the row append join to bring only the new records from a survey data set (and ignore all previously appended cases) into the main data table:

Data Table( "Main survey data set" ) << Join(

  With( Data Table( "Secondary survey data set" ) ),

  Merge Same Name Columns,

  Copy second table formula( 0 ),

  Suppress main table formula evaluation( 0 ),

  By Matching Columns( :ResponseID = :ResponseID ),

  Drop multiples( 0, 0 ),

  Name( "Include non-matches" )(0, 1),

  Preserve main table order( 1 )

)

After joining the two files, copied/restored the row states from the row state column using the row state menu.  Worked perfectly.  Thank you!

10 REPLIES
KarenC

Super User

Joined:

Feb 10, 2013

I find joining and concatenating tables a bit of an art (i.e., trial and error until I get what I really want).  However, I would recommend that you look up the column type "row states" as you can save all of your current row states (i.e., your hidden and excluded row states) in a column that you can reapply as needed to your table.

cdwhite

Community Trekker

Joined:

May 3, 2015

Solution

Thank you.  Added new column designated as row state data type to original data table and then used the copy to and from features in the row states menu (by clicking on row state column in column list box on left, not actual column header) to copy over and then re-apply the row states after joining.  The script for the row append join to bring only the new records from a survey data set (and ignore all previously appended cases) into the main data table:

Data Table( "Main survey data set" ) << Join(

  With( Data Table( "Secondary survey data set" ) ),

  Merge Same Name Columns,

  Copy second table formula( 0 ),

  Suppress main table formula evaluation( 0 ),

  By Matching Columns( :ResponseID = :ResponseID ),

  Drop multiples( 0, 0 ),

  Name( "Include non-matches" )(0, 1),

  Preserve main table order( 1 )

)

After joining the two files, copied/restored the row states from the row state column using the row state menu.  Worked perfectly.  Thank you!

pmroz

Super User

Joined:

Jun 23, 2011

You can do set operations using associative arrays.  Couldn't find a slick way to append one table to another.  Had to plow through the data using JSL loops.

dt_original = New Table( "Original", Add Rows( 4 ),

    New Column( "Column ID", Numeric, Continuous, Format( "Best", 12 ),

           Set Values( [1, 2, 3, 4] )

     ),

    New Column( "Result", Character, Nominal, Set Values( {"a", "b", "c", "d"} ) )

);

dt_new = New Table( "NewData", Add Rows( 10 ),

    New Column( "Column ID", Numeric, Continuous, Format( "Best", 12 ),

           Set Values( [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] )

     ),

    New Column( "Result", Character, Nominal,

           Set Values( {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j"} )

     )

);

// Determine what rows are in the new table that aren't in the old one

a_original = associative array(column(dt_original, "Column ID"));

a_new      = associative array(column(dt_new, "Column ID"));

a_new << remove(a_original);

new_rows = a_new << get keys;

// There has to be a faster way to append one table to another in place

ni = nrows(dt_original);

nr = nitems(new_rows);

dt_original << add rows(nr);

for (i = 1, i <= ncols(dt_original), i++,

    one_col = column(dt_original, i) << get name;

    for (k = 1, k <= nitems(new_rows), k++,

         m = new_rows[k];

         n = k + ni;

        column(dt_original, one_col)[n] = column(dt_new, one_col)[m];

    );

);

David_Burnham

Super User

Joined:

Jul 13, 2011

When performing a join you can avoid creating a new table by using the  option "update the main table with data from second table". 

-Dave
pmroz

Super User

Joined:

Jun 23, 2011

I tried using the update main table option but it still creates a new table.

Data Table( "Original" ) << Join(

     With( Data Table( "NewData" ) ),

     Update,

     By Matching Columns( :Column ID = :Column ID ),

     Drop multiples( 1, 1 ),

     Name( "Include non-matches" )(0, 1),

     Preserve main table order( 1 )

);

8791_Join-Update.png

robot

Community Trekker

Joined:

Feb 27, 2012

Hi cdwhite,

I wrote a script a while ago that may help.  The script below will delete duplicate rows while preserving the original data table, including formulas and row states.  Using the Concatenate function followed by this script may solve your problem.

Names Default To Here( 1 );

// Create data table and new data.

dt = Open( "$SAMPLE_DATA/Big Class.jmp" );

dt << Select Randomly( 10 );

dt2 = dt << Subset( (Selected Rows), Output Table Name( "New Data" ) );

dt << Clear Select;

dt2 << Add Rows( {:name = "PETER", :age = 14, :sex = "M", :height = 61, :weight = 124} );

// Randomly set row states in orginial data.

dt << Select Randomly( 10 ) << Hide << Exclude << Clear Select;

// Add new data.

dt << Concatenate( dt2, Append to First Table );

Close( dt2, No Save );

// Delete duplicate rows.

dt = Current Data Table();

col_min = Parse( "Col Min( Row(), :Name(\!"" || Concat Items( dt << get column names( string ), "\!"), :Name(\!"" ) || "\!"))" );  // Create a JSL expression to determine the minimum row number by all other columns.

duplicates = [];  // Create matrix to store duplicate rows.

For Each Row( If( col_min != Row(), duplicates |/= Row() ) );  // Compare each row to "col_min" expression above.  If the current row does not match the minimum row by all other columns, that row will be added to the "duplicates" matrix.

dt << delete rows( duplicates ); // Delete duplicates.

nn = N Rows( duplicates );

New Window( "Delete Duplicate Rows", <<Modal, Text Box( "Done." ), Text Box( Concat( "Deleted ", Char( nn ), " rows." ) ), Button Box( "OK" ) );

Martin

Community Trekker

Joined:

Aug 13, 2013

Hello Robot, could you explain what these two lines do?

Thx

1. col_min = Parse( "Col Min( Row(), :Name(\!"" || Concat Items( dt << get column names( string ), "\!"), :Name(\!"" ) || "\!"))" );

2. For Each Row( If( col_min != Row(), duplicates |/= Row() ) );

Martin

Community Trekker

Joined:

Aug 13, 2013

So I think I have figured it out.  Line 1 produced a string "Col Min( Row(), :name, :age, :sex, :height, :weight )" that will be used in (2) to determin the row number of a previous row where all columns match.  And, 2 will add the row number of the duplicate row to an array "duplicates".

This is quite cleaver.  I wonder what happens when there are triplicates.  Will it always keep the first instance and delete the second and third?  I will need to try it.

Thank you robot!

robot

Community Trekker

Joined:

Feb 27, 2012

Hi Martin,

You are correct.  Those lines do exactly what you deduced.  I added some comments to help make it more clear to others.

I am glad you find it useful!