Subscribe Bookmark RSS Feed

Re: substracting values depending on a variable that starts by 'c from a transactional dataset

Alexllorens

Occasional Contributor

Joined:

Oct 3, 2017

Hello, I have a dataframe with a column named 'InvoiceNumber' (invoiceNo, example 541431). And another column named ' Stockcode'.

 

When InvoiceNo starts by a C (example C541433) and it matches a previous StockCode I have to cancel the 'Quantity' and/or 'Unitprice' from my analysis. (it means a customer RETURNED the item). I'm trying to do an RFM analysis and a Market Basket Analysis but first, I need to take this into consideration first. How can I substract the total spent per CustomerID in both Quantity and unitprice when they return an item (and I have a row with a Invoiceno starting by 'C'.

 

How can I solve this problem? I have 500K+ rows of transactional data.

 

InvoiceNoStockCodeDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountry
54143123166MEDIUM CERAMIC TOP STORAGE JAR7421518/1/2011 10:011.0412346United Kingdom
C54143323166MEDIUM CERAMIC TOP STORAGE JAR-7421518/1/2011 10:171.0412346United Kingdom
53762685116BLACK CANDELABRA T-LIGHT HOLDER127/12/2010 14:572.112347Iceland
53762622375AIRLINE BAG VINTAGE JET SET BROWN47/12/2010 14:574.2512347Iceland
53762671477COLOUR GLASS. STAR T-LIGHT HOLDER127/12/2010 14:573.2512347Iceland
53762622492MINI PAINT SET VINTAGE 367/12/2010 14:570.6512347Iceland
53762622771CLEAR DRAWER KNOB ACRYLIC EDWARDIAN127/12/2010 14:571.2512347Iceland
53762622772PINK DRAWER KNOB ACRYLIC EDWARDIAN127/12/2010 14:571.2512347Iceland
53762622773GREEN DRAWER KNOB ACRYLIC EDWARDIAN127/12/2010 14:571.2512347Iceland
53762622774RED DRAWER KNOB ACRYLIC EDWARDIAN127/12/2010 14:571.2512347Iceland
53762622775PURPLE DRAWERKNOB ACRYLIC EDWARDIAN127/12/2010 14:571.2512347Iceland
53762622805BLUE DRAWER KNOB ACRYLIC EDWARDIAN127/12/2010 14:571.2512347Iceland
53762622725ALARM CLOCK BAKELIKE CHOCOLATE47/12/2010 14:573.7512347Iceland
53762622726ALARM CLOCK BAKELIKE GREEN47/12/2010 14:573.7512347Iceland
53762622727ALARM CLOCK BAKELIKE RED 47/12/2010 14:573.7512347Iceland
53762622728ALARM CLOCK BAKELIKE PINK47/12/2010 14:573.7512347Iceland
53762622729ALARM CLOCK BAKELIKE ORANGE47/12/2010 14:573.7512347Iceland

 

 

5 REPLIES
kevin_c_anderso

Community Trekker

Joined:

Jun 5, 2014

Hi, Alexllorens!

 

Is this something you do once, or something you do repeatedly?  I assume the answer is "repeatedly".  If so, you should write a JSL script.

 

Is it possible for a customer to return less than a full order?  In other words, if a customer ordered 50 units and returned 49, would you want to include the 1 remainder in your accounting?

Alexllorens

Occasional Contributor

Joined:

Oct 3, 2017

Thanks Kevin! yes I would include that 1 'left' in the accounting


kevin_c_anderso wrote:

Hi, Alexllorens!

 

Is this something you do once, or something you do repeatedly?  I assume the answer is "repeatedly".  If so, you should write a JSL script.

 

Is it possible for a customer to return less than a full order?  In other words, if a customer ordered 50 units and returned 49, would you want to include the 1 remainder in your accounting?


 

Highlighted
txnelson

Super User

Joined:

Jun 22, 2012

Below is a script that takes the UnBooking records, and removes their UnBooking quantity from the previous booking record for the given StockCode.  It then sets the UnBooking quantity to zero.  If no previous Booking record is found, the assumption is made that the record existed prior to the start of this data table, and sets the UnBooking's quantity to zero.

 

The issue with this script is that it isn't very efficient.  On the 54,000+ records, it took my old system 55 minutes.  Other methodologies could be used that would increase the speed, but I strove to develop something that followed your implied  methodology.

 

The script is also incomplete in fully dealing with the reconcilliation of the data.  That is, if the UnBooking quantity is 10 (in your data, -10) and the previous Booking quantity is 7, the script does not take the 3 unreconcilled products and continue backwards to find the next Booking record, and subtract the 3 unreconcilled products from that Booking.  That could be added if you choose to implement my methodolgy.

Names Default To Here( 1 );
dt = Current Data Table();

// Record the starting time to allow the reporting of the amount of  time it takes to 
// perform all of the calculations
start = Today();

// Create a new column that represents the row number in the data table
dt << New Column( "RowNum", formula( Row() ) );
dt:RowNum << delete Property( "Formula" );

// Find all of the observations in the data table that are UnBookings
UnBookList = dt << get rows where( Uppercase( Left( :InvoiceNo, 1 ) ) == "C" & :Quantity < 0 );

// Loop across all of the UnBookings and subtract the UnBooking quantity from the
// Booking order record
For( corrections = 1, corrections <= N Rows( UnBookList ), corrections++, 

	// Print out the processing progress
	If( Mod( corrections, 100 ) == 0,
		time = Format( Today() - start, "hr:m:s" );
		Show( time );
	);
	
	// Get the values of specific values for the current row being processed
	curRowNum = UnBookList[corrections];
	curCustomerID = dt:CustomerID[curRowNum];
	curStockCode = dt:StockCode[curRowNum];
	
	// Find the last row from the CustomerID's order for the StockCode being UnBooked
	last row = Max( dt << get rows where( :quantity > 0 & curCustomerID == :CustomerID & curStockCode == :StockCode & :RowNum < curRowNum ) );
	
	// If the record was found, then subtract (Sum the negative quantity) from the ordered quantity.
	// Also, set the current rows quantity to zero
	If( Is Missing( Last row ) == 0,
		dt:Quantity[Last row] = Sum( dt:Quantity[Last row], dt:Quantity[curRowNum] );
	);
	
	// Set the current row's quantity value to zero
	dt:Quantity[curRowNum] = 0;
);

// Print out the results 
Show( N Rows( dt << get rows where( :quantity < 0 & Uppercase( Left( :InvoiceNo, 1 ) ) == "C" ) ) );

// Show the time it took to process the file
time = Format( Today() - start, "hr:m:s" );
Show( time );

// Delete the utility column RowNum
dt << Delete Columns( "RowNum" );

 

 

Jim
ian_jmp

Staff

Joined:

Jun 23, 2011

I didn't fully understand all the nuances, specifically what consitutes the composite primary key for identifying records that should be adjusted, nor what adjustments should be made.

But it may be that a development of the code below could do it. Not a fair comparison, but it takes less than a second to do what it does:

NamesDefaultToHere(1);

dt = DataTable("Original");
// Find and select all the rows for a cancelled transaction
dt << selectWhere(startsWith(:InvoiceNo, "C"));
// Put these transactions into another table
dt2 = dt << Subset(SelectedRows(1), Columns(:Stock Code, :Customer ID, :UnitPrice));
dt2 << setName("Cancelled");
// Delete the cancelled transactions in dt
dt << deleteRows;
// Join dt2 to dt
dt3 = dt << Join(
			With( dt2 ),
			By Matching Columns(
				:StockCode = :StockCode,
				:CustomerID = :CustomerID,
				:UnitPrice = :UnitPrice
			),
			Drop multiples( 0, 0 ),
			Include Nonmatches( 1, 0 ),
			Preserve main table order( 1 ),
			OutputTableName("Work with Me")
		);
Close(dt, NoSave);
Close(dt2, NoSave);

 

txnelson

Super User

Joined:

Jun 22, 2012

Below is a new version that fully reconciles the data in a single pass of the data table.  It takes about 30 seconds to process

Names Default To Here( 1 );
dt = Current Data Table();

// Add a RowNum column to permit the reordering back to
// the original order after processing
dt << New Column( "RowNum", formula( Row() ) );
dt:RowNum << delete property( "formula" );

// Sort the data so that processing can start at the
// top of the data, and flow downward, and the required 
// data are grouped together and in row order
dt << Sort(
	By( :CustomerID, :StockCode, :RowNum ),
	Order( Ascending, Ascending, Descending ),
	Replace Table( 1 )
);

// Find the starting row, that is, the first row with a
// CustomerID value
Start = 1;
While( Is Missing( dt:CustomerID[Start] ) == 1, Start++ );

// Loop across all rows, starting with the first non missing
// CustomerID
For( RowCNT = Start, RowCNT <= N Rows( dt ), RowCNT++,
	
	// When the CustomerID or StockCode changes, or if this is 
	// the first loop, zero out the UnBookingAmount
	If(
		Try( :CustomerID[RowCNT - 1], "" ) != :CustomerID[RowCNT] |
		Try( :StockCode[RowCNT - 1], "" ) != :StockCode[RowCNT] | RowCNT == Start,
		UnBookingAmount = 0
	);
	
	// If this is a Change Invoice, then add the change to 
	// the running UnBookingAmount (Summing a negative amount actually makes
	// the value of UnBookingAmount even more negative)
	If( Starts With( :InvoiceNo[RowCNT], "C" ) == 1,
		UnBookingAmount = Sum( UnBookingAmount, :Quantity[RowCNT] );
		:Quantity[RowCNT] = 0;
	, // Else
	
		// Reduct the order by the current value of UnBookingAmount
		:Quantity[RowCNT] = Sum( UnBookingAmount, :Quantity[RowCNT] );
		
		// If the Quantity is less than zero, that means more
		// UnBooking amount needs to be subtracted on earlier
		// orders so reset the UnBookingAmountt remaining to be reconciled,
		// and set the current order to 0
		If( :Quantity[RowCNT] < 0,
			UnBookingAmount = :Quantity[RowCNT];
			:Quantity[RowCNT] = 0;
		, // Else
		
			// If the Quantity is >= 0 that means all UnBookings 
			// have been handled, so set the UnBookingAmount to 0
			UnBookingAmount = 0
		);
	);
);

// Reorder the table back to the original order
dt << Sort( By( :RowNum ), Order( Ascending ), Replace Table( 1 ) );

// Delete the no longer needed RowNum column
dt << delete columns( "RowNum" );
Jim