I have a dataset with redundant data that needs to be cleaned up before running analysis. One of the issues is that sometimes there are two records which are mostly identical except for a few fields. I would like to do something like the following...
:ID = :ID
& :Date = :Date
& :Depth = :Depth
& : Parameter = "pH"
& :Activity Type = "Field"
:Column10 = "Use"
:Column10 = "Don't Use"
I'm having trouble figuring out what function to use to find the rows where the three columns match. Once those three columns have been matched, I would like to put a value in an additional column indicating which of the rows have values that are equal to "pH" and "field" in the Parameter and Activity Type columns respectively.
I may be going about this in completely the wrong way so any advice/suggestions/shots in the dark for me to look into, are most welcome! I have done a good bit of internet searching on this one and haven't been able to come up with anything, but I may not be asking the question right. So if you have suggestions as to how to research this issue, that would also be very welcome.
Thanks in advance!
I would sort the data based upon ID, Date and Depth, and then create a new column that does the comparison and chooses if it should be used or not:
Sort the data
Create the new column with the formula:
If(:ID==Lag(:ID) & : Date==Lag(: Date) & : Depth==Lag(: Depth) & : Parameter =="pH" & :Activity Type =="Field", 1, 0)
then you can subset or exclude as desired to do your analyses.
Thanks so much, Jim. This worked great. I actually had to change the activity type to say not equal to the other value in the column for some reason (:Activity Type !="sample") other wise all values in the formula column were 0. That may have been more of an issue with my data. Not sure. Either way, thanks so much for your help and showing me how to use a new-to-me function! I will be using this a lot in the future.