topic Re: substracting values depending on a variable that starts by 'c from a transactional dataset in Discussions
https://community.jmp.com/t5/Discussions/substracting-values-depending-on-a-variable-that-starts-by-c/m-p/45588#M26052
<P>Below is a script that takes the UnBooking records, and removes their UnBooking quantity from the previous booking record for the given StockCode. It then sets the UnBooking quantity to zero. If no previous Booking record is found, the assumption is made that the record existed prior to the start of this data table, and sets the UnBooking's quantity to zero.</P>
<P> </P>
<P>The issue with this script is that it isn't very efficient. On the 54,000+ records, it took my old system 55 minutes. Other methodologies could be used that would increase the speed, but I strove to develop something that followed your implied methodology.</P>
<P> </P>
<P>The script is also incomplete in fully dealing with the reconcilliation of the data. That is, if the UnBooking quantity is 10 (in your data, -10) and the previous Booking quantity is 7, the script does not take the 3 unreconcilled products and continue backwards to find the next Booking record, and subtract the 3 unreconcilled products from that Booking. That could be added if you choose to implement my methodolgy.</P>
<PRE><CODE class=" language-jsl">Names Default To Here( 1 );
dt = Current Data Table();
// Record the starting time to allow the reporting of the amount of time it takes to
// perform all of the calculations
start = Today();
// Create a new column that represents the row number in the data table
dt << New Column( "RowNum", formula( Row() ) );
dt:RowNum << delete Property( "Formula" );
// Find all of the observations in the data table that are UnBookings
UnBookList = dt << get rows where( Uppercase( Left( :InvoiceNo, 1 ) ) == "C" & :Quantity < 0 );
// Loop across all of the UnBookings and subtract the UnBooking quantity from the
// Booking order record
For( corrections = 1, corrections <= N Rows( UnBookList ), corrections++,
// Print out the processing progress
If( Mod( corrections, 100 ) == 0,
time = Format( Today() - start, "hr:m:s" );
Show( time );
);
// Get the values of specific values for the current row being processed
curRowNum = UnBookList[corrections];
curCustomerID = dt:CustomerID[curRowNum];
curStockCode = dt:StockCode[curRowNum];
// Find the last row from the CustomerID's order for the StockCode being UnBooked
last row = Max( dt << get rows where( :quantity > 0 & curCustomerID == :CustomerID & curStockCode == :StockCode & :RowNum < curRowNum ) );
// If the record was found, then subtract (Sum the negative quantity) from the ordered quantity.
// Also, set the current rows quantity to zero
If( Is Missing( Last row ) == 0,
dt:Quantity[Last row] = Sum( dt:Quantity[Last row], dt:Quantity[curRowNum] );
);
// Set the current row's quantity value to zero
dt:Quantity[curRowNum] = 0;
);
// Print out the results
Show( N Rows( dt << get rows where( :quantity < 0 & Uppercase( Left( :InvoiceNo, 1 ) ) == "C" ) ) );
// Show the time it took to process the file
time = Format( Today() - start, "hr:m:s" );
Show( time );
// Delete the utility column RowNum
dt << Delete Columns( "RowNum" );</CODE></PRE>
<P> </P>
<P> </P>Fri, 06 Oct 2017 05:01:46 GMTtxnelson2017-10-06T05:01:46Zsubstracting values depending on a variable that starts by 'c from a transactional dataset
https://community.jmp.com/t5/Discussions/substracting-values-depending-on-a-variable-that-starts-by-c/m-p/45531#M26033
<P>Hello, I have a dataframe with a column named 'InvoiceNumber' (invoiceNo, example <FONT color="#99CC00"><SPAN>541431</SPAN></FONT>). And another column named ' Stockcode'.</P><P> </P><P>When InvoiceNo starts by a C (example <FONT color="#FF0000"><SPAN>C541433</SPAN></FONT>) and it matches a previous StockCode <U>I have to cancel the 'Quantity' and/or 'Unitprice' from my analysis</U>. (it means a customer RETURNED the item). I'm trying to do an RFM analysis and a Market Basket Analysis but first, I need to take this into consideration first. How can I substract the total spent per CustomerID in both Quantity and unitprice when they return an item (and I have a row with a Invoiceno starting by 'C'.</P><P> </P><P>How can I solve this problem? I have 500K+ rows of transactional data.</P><P> </P><TABLE><TBODY><TR><TD>InvoiceNo</TD><TD><FONT color="#FFCC99">StockCode</FONT></TD><TD>Description</TD><TD><FONT color="#FFCC99">Quantity</FONT></TD><TD>InvoiceDate</TD><TD><FONT color="#FFCC99">UnitPrice</FONT></TD><TD>CustomerID</TD><TD>Country</TD></TR><TR><TD><FONT color="#99CC00">541431</FONT></TD><TD><FONT color="#FFCC99">23166</FONT></TD><TD>MEDIUM CERAMIC TOP STORAGE JAR</TD><TD><FONT color="#FFCC99">74215</FONT></TD><TD>18/1/2011 10:01</TD><TD><FONT color="#FFCC99">1.04</FONT></TD><TD>12346</TD><TD>United Kingdom</TD></TR><TR><TD><FONT color="#FFFF00"><FONT color="#FF0000">C541433</FONT></FONT></TD><TD><FONT color="#FFCC99">23166</FONT></TD><TD>MEDIUM CERAMIC TOP STORAGE JAR</TD><TD><FONT color="#FFCC99">-74215</FONT></TD><TD>18/1/2011 10:17</TD><TD><FONT color="#FFCC99">1.04</FONT></TD><TD>12346</TD><TD>United Kingdom</TD></TR><TR><TD>537626</TD><TD>85116</TD><TD>BLACK CANDELABRA T-LIGHT HOLDER</TD><TD>12</TD><TD>7/12/2010 14:57</TD><TD>2.1</TD><TD>12347</TD><TD>Iceland</TD></TR><TR><TD>537626</TD><TD>22375</TD><TD>AIRLINE BAG VINTAGE JET SET BROWN</TD><TD>4</TD><TD>7/12/2010 14:57</TD><TD>4.25</TD><TD>12347</TD><TD>Iceland</TD></TR><TR><TD>537626</TD><TD>71477</TD><TD>COLOUR GLASS. STAR T-LIGHT HOLDER</TD><TD>12</TD><TD>7/12/2010 14:57</TD><TD>3.25</TD><TD>12347</TD><TD>Iceland</TD></TR><TR><TD>537626</TD><TD>22492</TD><TD>MINI PAINT SET VINTAGE </TD><TD>36</TD><TD>7/12/2010 14:57</TD><TD>0.65</TD><TD>12347</TD><TD>Iceland</TD></TR><TR><TD>537626</TD><TD>22771</TD><TD>CLEAR DRAWER KNOB ACRYLIC EDWARDIAN</TD><TD>12</TD><TD>7/12/2010 14:57</TD><TD>1.25</TD><TD>12347</TD><TD>Iceland</TD></TR><TR><TD>537626</TD><TD>22772</TD><TD>PINK DRAWER KNOB ACRYLIC EDWARDIAN</TD><TD>12</TD><TD>7/12/2010 14:57</TD><TD>1.25</TD><TD>12347</TD><TD>Iceland</TD></TR><TR><TD>537626</TD><TD>22773</TD><TD>GREEN DRAWER KNOB ACRYLIC EDWARDIAN</TD><TD>12</TD><TD>7/12/2010 14:57</TD><TD>1.25</TD><TD>12347</TD><TD>Iceland</TD></TR><TR><TD>537626</TD><TD>22774</TD><TD>RED DRAWER KNOB ACRYLIC EDWARDIAN</TD><TD>12</TD><TD>7/12/2010 14:57</TD><TD>1.25</TD><TD>12347</TD><TD>Iceland</TD></TR><TR><TD>537626</TD><TD>22775</TD><TD>PURPLE DRAWERKNOB ACRYLIC EDWARDIAN</TD><TD>12</TD><TD>7/12/2010 14:57</TD><TD>1.25</TD><TD>12347</TD><TD>Iceland</TD></TR><TR><TD>537626</TD><TD>22805</TD><TD>BLUE DRAWER KNOB ACRYLIC EDWARDIAN</TD><TD>12</TD><TD>7/12/2010 14:57</TD><TD>1.25</TD><TD>12347</TD><TD>Iceland</TD></TR><TR><TD>537626</TD><TD>22725</TD><TD>ALARM CLOCK BAKELIKE CHOCOLATE</TD><TD>4</TD><TD>7/12/2010 14:57</TD><TD>3.75</TD><TD>12347</TD><TD>Iceland</TD></TR><TR><TD>537626</TD><TD>22726</TD><TD>ALARM CLOCK BAKELIKE GREEN</TD><TD>4</TD><TD>7/12/2010 14:57</TD><TD>3.75</TD><TD>12347</TD><TD>Iceland</TD></TR><TR><TD>537626</TD><TD>22727</TD><TD>ALARM CLOCK BAKELIKE RED </TD><TD>4</TD><TD>7/12/2010 14:57</TD><TD>3.75</TD><TD>12347</TD><TD>Iceland</TD></TR><TR><TD>537626</TD><TD>22728</TD><TD>ALARM CLOCK BAKELIKE PINK</TD><TD>4</TD><TD>7/12/2010 14:57</TD><TD>3.75</TD><TD>12347</TD><TD>Iceland</TD></TR><TR><TD>537626</TD><TD>22729</TD><TD>ALARM CLOCK BAKELIKE ORANGE</TD><TD>4</TD><TD>7/12/2010 14:57</TD><TD>3.75</TD><TD>12347</TD><TD>Iceland</TD></TR></TBODY></TABLE><P> </P><P> </P>Thu, 05 Oct 2017 13:59:00 GMThttps://community.jmp.com/t5/Discussions/substracting-values-depending-on-a-variable-that-starts-by-c/m-p/45531#M26033Alexllorens2017-10-05T13:59:00ZRe: substracting values depending on a variable that starts by 'c from a transactional dataset
https://community.jmp.com/t5/Discussions/substracting-values-depending-on-a-variable-that-starts-by-c/m-p/45540#M26035
<P>Hi, Alexllorens!</P><P> </P><P>Is this something you do once, or something you do repeatedly? I assume the answer is "repeatedly". If so, you should write a JSL script.</P><P> </P><P>Is it possible for a customer to return less than a full order? In other words, if a customer ordered 50 units and returned 49, would you want to include the 1 remainder in your accounting?</P>Thu, 05 Oct 2017 15:03:43 GMThttps://community.jmp.com/t5/Discussions/substracting-values-depending-on-a-variable-that-starts-by-c/m-p/45540#M26035kevin_c_anderso2017-10-05T15:03:43ZRe: substracting values depending on a variable that starts by 'c from a transactional dataset
https://community.jmp.com/t5/Discussions/substracting-values-depending-on-a-variable-that-starts-by-c/m-p/45588#M26052
<P>Below is a script that takes the UnBooking records, and removes their UnBooking quantity from the previous booking record for the given StockCode. It then sets the UnBooking quantity to zero. If no previous Booking record is found, the assumption is made that the record existed prior to the start of this data table, and sets the UnBooking's quantity to zero.</P>
<P> </P>
<P>The issue with this script is that it isn't very efficient. On the 54,000+ records, it took my old system 55 minutes. Other methodologies could be used that would increase the speed, but I strove to develop something that followed your implied methodology.</P>
<P> </P>
<P>The script is also incomplete in fully dealing with the reconcilliation of the data. That is, if the UnBooking quantity is 10 (in your data, -10) and the previous Booking quantity is 7, the script does not take the 3 unreconcilled products and continue backwards to find the next Booking record, and subtract the 3 unreconcilled products from that Booking. That could be added if you choose to implement my methodolgy.</P>
<PRE><CODE class=" language-jsl">Names Default To Here( 1 );
dt = Current Data Table();
// Record the starting time to allow the reporting of the amount of time it takes to
// perform all of the calculations
start = Today();
// Create a new column that represents the row number in the data table
dt << New Column( "RowNum", formula( Row() ) );
dt:RowNum << delete Property( "Formula" );
// Find all of the observations in the data table that are UnBookings
UnBookList = dt << get rows where( Uppercase( Left( :InvoiceNo, 1 ) ) == "C" & :Quantity < 0 );
// Loop across all of the UnBookings and subtract the UnBooking quantity from the
// Booking order record
For( corrections = 1, corrections <= N Rows( UnBookList ), corrections++,
// Print out the processing progress
If( Mod( corrections, 100 ) == 0,
time = Format( Today() - start, "hr:m:s" );
Show( time );
);
// Get the values of specific values for the current row being processed
curRowNum = UnBookList[corrections];
curCustomerID = dt:CustomerID[curRowNum];
curStockCode = dt:StockCode[curRowNum];
// Find the last row from the CustomerID's order for the StockCode being UnBooked
last row = Max( dt << get rows where( :quantity > 0 & curCustomerID == :CustomerID & curStockCode == :StockCode & :RowNum < curRowNum ) );
// If the record was found, then subtract (Sum the negative quantity) from the ordered quantity.
// Also, set the current rows quantity to zero
If( Is Missing( Last row ) == 0,
dt:Quantity[Last row] = Sum( dt:Quantity[Last row], dt:Quantity[curRowNum] );
);
// Set the current row's quantity value to zero
dt:Quantity[curRowNum] = 0;
);
// Print out the results
Show( N Rows( dt << get rows where( :quantity < 0 & Uppercase( Left( :InvoiceNo, 1 ) ) == "C" ) ) );
// Show the time it took to process the file
time = Format( Today() - start, "hr:m:s" );
Show( time );
// Delete the utility column RowNum
dt << Delete Columns( "RowNum" );</CODE></PRE>
<P> </P>
<P> </P>Fri, 06 Oct 2017 05:01:46 GMThttps://community.jmp.com/t5/Discussions/substracting-values-depending-on-a-variable-that-starts-by-c/m-p/45588#M26052txnelson2017-10-06T05:01:46ZRe: substracting values depending on a variable that starts by 'c from a transactional dataset
https://community.jmp.com/t5/Discussions/substracting-values-depending-on-a-variable-that-starts-by-c/m-p/45590#M26053
<P>I didn't fully understand all the nuances, specifically what consitutes the composite primary key for identifying records that should be adjusted, nor what adjustments should be made.</P>
<P>But it may be that a development of the code below could do it. Not a fair comparison, but it takes less than a second to do what it does:</P>
<PRE><CODE class=" language-jsl">NamesDefaultToHere(1);
dt = DataTable("Original");
// Find and select all the rows for a cancelled transaction
dt << selectWhere(startsWith(:InvoiceNo, "C"));
// Put these transactions into another table
dt2 = dt << Subset(SelectedRows(1), Columns(:Stock Code, :Customer ID, :UnitPrice));
dt2 << setName("Cancelled");
// Delete the cancelled transactions in dt
dt << deleteRows;
// Join dt2 to dt
dt3 = dt << Join(
With( dt2 ),
By Matching Columns(
:StockCode = :StockCode,
:CustomerID = :CustomerID,
:UnitPrice = :UnitPrice
),
Drop multiples( 0, 0 ),
Include Nonmatches( 1, 0 ),
Preserve main table order( 1 ),
OutputTableName("Work with Me")
);
Close(dt, NoSave);
Close(dt2, NoSave);</CODE></PRE>
<P> </P>Fri, 06 Oct 2017 11:10:30 GMThttps://community.jmp.com/t5/Discussions/substracting-values-depending-on-a-variable-that-starts-by-c/m-p/45590#M26053ian_jmp2017-10-06T11:10:30ZRe: substracting values depending on a variable that starts by 'c from a transactional dataset
https://community.jmp.com/t5/Discussions/substracting-values-depending-on-a-variable-that-starts-by-c/m-p/45630#M26069
<P>Thanks Kevin! yes I would include that 1 'left' in the accounting</P><BLOCKQUOTE><HR />kevin_c_anderso wrote:<BR /><P>Hi, Alexllorens!</P><P> </P><P>Is this something you do once, or something you do repeatedly? I assume the answer is "repeatedly". If so, you should write a JSL script.</P><P> </P><P>Is it possible for a customer to return less than a full order? In other words, if a customer ordered 50 units and returned 49, would you want to include the 1 remainder in your accounting?</P><HR /></BLOCKQUOTE><P> </P>Sat, 07 Oct 2017 04:43:43 GMThttps://community.jmp.com/t5/Discussions/substracting-values-depending-on-a-variable-that-starts-by-c/m-p/45630#M26069Alexllorens2017-10-07T04:43:43ZRe: substracting values depending on a variable that starts by 'c from a transactional dataset
https://community.jmp.com/t5/Discussions/substracting-values-depending-on-a-variable-that-starts-by-c/m-p/45631#M26070
<P>Below is a new version that fully reconciles the data in a single pass of the data table. It takes about 30 seconds to process</P>
<PRE><CODE class=" language-jsl">Names Default To Here( 1 );
dt = Current Data Table();
// Add a RowNum column to permit the reordering back to
// the original order after processing
dt << New Column( "RowNum", formula( Row() ) );
dt:RowNum << delete property( "formula" );
// Sort the data so that processing can start at the
// top of the data, and flow downward, and the required
// data are grouped together and in row order
dt << Sort(
By( :CustomerID, :StockCode, :RowNum ),
Order( Ascending, Ascending, Descending ),
Replace Table( 1 )
);
// Find the starting row, that is, the first row with a
// CustomerID value
Start = 1;
While( Is Missing( dt:CustomerID[Start] ) == 1, Start++ );
// Loop across all rows, starting with the first non missing
// CustomerID
For( RowCNT = Start, RowCNT <= N Rows( dt ), RowCNT++,
// When the CustomerID or StockCode changes, or if this is
// the first loop, zero out the UnBookingAmount
If(
Try( :CustomerID[RowCNT - 1], "" ) != :CustomerID[RowCNT] |
Try( :StockCode[RowCNT - 1], "" ) != :StockCode[RowCNT] | RowCNT == Start,
UnBookingAmount = 0
);
// If this is a Change Invoice, then add the change to
// the running UnBookingAmount (Summing a negative amount actually makes
// the value of UnBookingAmount even more negative)
If( Starts With( :InvoiceNo[RowCNT], "C" ) == 1,
UnBookingAmount = Sum( UnBookingAmount, :Quantity[RowCNT] );
:Quantity[RowCNT] = 0;
, // Else
// Reduct the order by the current value of UnBookingAmount
:Quantity[RowCNT] = Sum( UnBookingAmount, :Quantity[RowCNT] );
// If the Quantity is less than zero, that means more
// UnBooking amount needs to be subtracted on earlier
// orders so reset the UnBookingAmountt remaining to be reconciled,
// and set the current order to 0
If( :Quantity[RowCNT] < 0,
UnBookingAmount = :Quantity[RowCNT];
:Quantity[RowCNT] = 0;
, // Else
// If the Quantity is >= 0 that means all UnBookings
// have been handled, so set the UnBookingAmount to 0
UnBookingAmount = 0
);
);
);
// Reorder the table back to the original order
dt << Sort( By( :RowNum ), Order( Ascending ), Replace Table( 1 ) );
// Delete the no longer needed RowNum column
dt << delete columns( "RowNum" );</CODE></PRE>Sat, 07 Oct 2017 15:08:56 GMThttps://community.jmp.com/t5/Discussions/substracting-values-depending-on-a-variable-that-starts-by-c/m-p/45631#M26070txnelson2017-10-07T15:08:56Z