Subscribe Bookmark RSS Feed

I attach a dataset: How to substract duplicates of invoices starting by a 'letter' ?

Alexllorens

Occasional Contributor

Joined:

Oct 3, 2017

Hello, I have a dataframe with a column named 'InvoiceNumber' (invoiceNo, example 541431). And another column named ' Stockcode'.

 

When InvoiceNo starts by a C (example C541433) and it matches a previous StockCode I have to cancel the 'Quantity' and/or 'Unitprice' from my analysis. (it means a customer RETURNED the item). I'm trying to do an RFM analysis and a Market Basket Analysis but first, I need to take this into consideration first.

 

How can I solve this problem? I have 500K+ rows of transactional data.

 

InvoiceNoStockCodeDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountry
54143123166MEDIUM CERAMIC TOP STORAGE JAR7421518/1/2011 10:011.0412346United Kingdom
C54143323166MEDIUM CERAMIC TOP STORAGE JAR-7421518/1/2011 10:171.0412346United Kingdom
53762685116BLACK CANDELABRA T-LIGHT HOLDER127/12/2010 14:572.112347Iceland
53762622375AIRLINE BAG VINTAGE JET SET BROWN47/12/2010 14:574.2512347Iceland
53762671477COLOUR GLASS. STAR T-LIGHT HOLDER127/12/2010 14:573.2512347Iceland
53762622492MINI PAINT SET VINTAGE 367/12/2010 14:570.6512347Iceland
53762622771CLEAR DRAWER KNOB ACRYLIC EDWARDIAN127/12/2010 14:571.2512347Iceland
53762622772PINK DRAWER KNOB ACRYLIC EDWARDIAN127/12/2010 14:571.2512347Iceland
53762622773GREEN DRAWER KNOB ACRYLIC EDWARDIAN127/12/2010 14:571.2512347Iceland
53762622774RED DRAWER KNOB ACRYLIC EDWARDIAN127/12/2010 14:571.2512347Iceland
53762622775PURPLE DRAWERKNOB ACRYLIC EDWARDIAN127/12/2010 14:571.2512347Iceland
53762622805BLUE DRAWER KNOB ACRYLIC EDWARDIAN127/12/2010 14:571.2512347Iceland
53762622725ALARM CLOCK BAKELIKE CHOCOLATE47/12/2010 14:573.7512347Iceland
53762622726ALARM CLOCK BAKELIKE GREEN47/12/2010 14:573.7512347Iceland
53762622727ALARM CLOCK BAKELIKE RED 47/12/2010 14:573.7512347Iceland
53762622728ALARM CLOCK BAKELIKE PINK47/12/2010 14:573.7512347Iceland
53762622729ALARM CLOCK BAKELIKE ORANGE47/12/2010 14:573.7512347Iceland

 

 

1 REPLY
vince_faller

Super User

Joined:

Mar 17, 2015

It doesn't look like it's a duplicate.  It looks like the quantity is negative so it seems like it's already compensated.