cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
ealindahl
Level II

Delete strings in a list that contain a word

I have a list of file names from which I want to select certain files. The first 4 digits are the year the underscore is acting like a deliminater, the "abc" is just an identifier for the type of file, the 3 digit number is the id, and "summary" or "result" refer to the kind of file.

First, I only want to look at the "result" files. I couldn't figure out how to remove all the "summary" files from my list. 

{"2021_abc_027_result.csv", "2021_abc_027_summary.csv", "2021_abc_028_result.csv",
"2021_abc_028_summary.csv", "2021_abc_029_result.csv", "2021_abc_029_summary.csv",
"2021_abc_038_result.csv", "2021_abc_038_summary.csv", "2021_abc_040_result.csv",
"2021_abc_040_summary.csv", "2021_abc_041_result.csv", "2021_abc_041_summary.csv",
"2021_abc_042_result.csv", "2021_abc_042_summary.csv", "2021_abc_043_result.csv",
"2021_abc_043_summary.csv", "2021_abc_044_result.csv", "2021_abc_044_summary.csv",
"2021_abc_045_result.csv", "2021_abc_045_summary.csv", "2021_abc_046_result.csv",
"2021_abc_046_summary.csv", "2021_abc_047_result.csv", "2021_abc_047_summary.csv",
"2021_abc_048_result.csv", "2021_abc_048_summary.csv", "2021_abc_049_result.csv",
"2021_abc_049_summary.csv", "2021_abc_050_result.csv", "2021_abc_050_summary.csv",
"2021_abc_051_result.csv", "2021_abc_051_summary.csv", "2021_abc_052_result.csv",
"2021_abc_052_summary.csv", "2021_abc_053_result.csv", "2021_abc_053_summary.csv",
"2021_abc_054_result.csv", "2021_abc_054_summary.csv", "2021_abc_055_result.csv",
"2021_abc_055_summary.csv", "2021_abc_056_result.csv", "2021_abc_056_summary.csv",
"2021_abc_057_result.csv", "2021_abc_057_summary.csv", "2021_abc_058_result.csv",
"2021_abc_058_summary.csv", "2021_abc_059_result.csv", "2021_abc_059_summary.csv",
"2021_abc_060_result.csv", "2021_abc_060_summary.csv", "2021_abc_061_result.csv",
"2021_abc_061_summary.csv", "2021_abc_062_result.csv", "2021_abc_062_summary.csv",
"2021_abc_063_result.csv", "2021_abc_063_summary.csv", "2021_abc_064_result.csv",
"2021_abc_064_summary.csv", "2021_abc_065_result.csv", "2021_abc_065_summary.csv",
"2021_abc_067_result.csv"}


Second, I want to select the files based on the year and 3 digit code.

 

I have done something similar in the past to select files using regex where there was only one number in the string so I used 

For( i = 1, i <= N Items( FileList ), i++,
	FileList[i] = regex(FileList[i], "([0-9]+)", "\1")
);

but in this case I end up with a list of "2021", "2021",...

Ideally I would end up with a list looking like 

{"2021_027 ", "2021_027 ", "2021_028 ",
"2021_028 ", "2021_029 ", "2021_029 ",
"2021_038 ", "2021_038 ", "2021_040 ",
"2021_040 ", "2021_041 ", "2021_041 ",
"2021_042 ", "2021_042 ", "2021_043 ",
"2021_043 ", "2021_044 ", "2021_044 ",
"2021_045 ", "2021_045 ", "2021_046 ",
"2021_046 ", "2021_047 ", "2021_047 ",
"2021_048 ", "2021_048 ", "2021_049 ",
"2021_049 ", "2021_050 ", "2021_050 ",
"2021_051 ", "2021_051 ", "2021_052 ",
"2021_052 ", "2021_053 ", "2021_053 ",
"2021_054 ", "2021_054 ", "2021_055 ",
"2021_055 ", "2021_056 ", "2021_056 ",
"2021_057 ", "2021_057 ", "2021_058 ",
"2021_058 ", "2021_059 ", "2021_059 ",
"2021_060 ", "2021_060 ", "2021_061 ",
"2021_061 ", "2021_062 ", "2021_062 ",
"2021_063 ", "2021_063 ", "2021_064 ",
"2021_064 ", "2021_065 ", "2021_065 ",
"2021_067 "}

and then I could select from that list the files I want to use.

 

Also any tips on learning how to regex better are welcome.

Thank you!

4 REPLIES 4
Thierry_S
Super User

Re: Delete strings in a list that contain a word

Hi,

 

Here is a brute force approach to your problem that bypass the need to use regex (I'm not too good with those).

Names default to Here (1);

StartList = {"2021_abc_027_result.csv", "2021_abc_027_summary.csv", "2021_abc_028_result.csv",
"2021_abc_028_summary.csv", "2021_abc_029_result.csv", "2021_abc_029_summary.csv",
"2021_abc_038_result.csv", "2021_abc_038_summary.csv", "2021_abc_040_result.csv",
"2021_abc_040_summary.csv", "2021_abc_041_result.csv", "2021_abc_041_summary.csv",
"2021_abc_042_result.csv", "2021_abc_042_summary.csv", "2021_abc_043_result.csv",
"2021_abc_043_summary.csv", "2021_abc_044_result.csv", "2021_abc_044_summary.csv",
"2021_abc_045_result.csv", "2021_abc_045_summary.csv", "2021_abc_046_result.csv",
"2021_abc_046_summary.csv", "2021_abc_047_result.csv", "2021_abc_047_summary.csv",
"2021_abc_048_result.csv", "2021_abc_048_summary.csv", "2021_abc_049_result.csv",
"2021_abc_049_summary.csv", "2021_abc_050_result.csv", "2021_abc_050_summary.csv",
"2021_abc_051_result.csv", "2021_abc_051_summary.csv", "2021_abc_052_result.csv",
"2021_abc_052_summary.csv", "2021_abc_053_result.csv", "2021_abc_053_summary.csv",
"2021_abc_054_result.csv", "2021_abc_054_summary.csv", "2021_abc_055_result.csv",
"2021_abc_055_summary.csv", "2021_abc_056_result.csv", "2021_abc_056_summary.csv",
"2021_abc_057_result.csv", "2021_abc_057_summary.csv", "2021_abc_058_result.csv",
"2021_abc_058_summary.csv", "2021_abc_059_result.csv", "2021_abc_059_summary.csv",
"2021_abc_060_result.csv", "2021_abc_060_summary.csv", "2021_abc_061_result.csv",
"2021_abc_061_summary.csv", "2021_abc_062_result.csv", "2021_abc_062_summary.csv",
"2021_abc_063_result.csv", "2021_abc_063_summary.csv", "2021_abc_064_result.csv",
"2021_abc_064_summary.csv", "2021_abc_065_result.csv", "2021_abc_065_summary.csv",
"2021_abc_067_result.csv"};

ResultList = {};

For (i = 1, i <= N Items (StartList), i++,

	if (contains (StartList [i], "result"), insert into (ResultList, StartList[i]))
	
);

Show (ResultList); // Yields a clean list with results only

FileList = {};

For (i= 1 , i <= N Items (ResultList), i++,
	
	Insert into (FileList, Word (1, ResultList [i], "_") || "_" || Word (3, ResultList [i], "_"))
);

Show (FileList); // Yields the file Year and Numerical ID

You can then use that list to feed into a ListBox or directly retrieve the corresponding file with the Contains () function.

I'll read more about regex for the next time.

Best,

TS

Thierry R. Sornasse
Georg
Level VII

Re: Delete strings in a list that contain a word

Here's one regex approach:

 

Names Default To Here( 1 );

Assign(
	file_lst,
	{"2021_abc_027_result.csv", "2021_abc_027_summary.csv", "2021_abc_028_result.csv", "2021_abc_028_summary.csv", "2021_abc_029_result.csv",
	"2021_abc_029_summary.csv", "2021_abc_038_result.csv", "2021_abc_038_summary.csv", "2021_abc_040_result.csv", "2021_abc_040_summary.csv",
	"2021_abc_041_result.csv", "2021_abc_041_summary.csv", "2021_abc_042_result.csv", "2021_abc_042_summary.csv", "2021_abc_043_result.csv",
	"2021_abc_043_summary.csv", "2021_abc_044_result.csv", "2021_abc_044_summary.csv", "2021_abc_045_result.csv", "2021_abc_045_summary.csv",
	"2021_abc_046_result.csv", "2021_abc_046_summary.csv", "2021_abc_047_result.csv", "2021_abc_047_summary.csv", "2021_abc_048_result.csv",
	"2021_abc_048_summary.csv", "2021_abc_049_result.csv", "2021_abc_049_summary.csv", "2021_abc_050_result.csv", "2021_abc_050_summary.csv",
	"2021_abc_051_result.csv", "2021_abc_051_summary.csv", "2021_abc_052_result.csv", "2021_abc_052_summary.csv", "2021_abc_053_result.csv",
	"2021_abc_053_summary.csv", "2021_abc_054_result.csv", "2021_abc_054_summary.csv", "2021_abc_055_result.csv", "2021_abc_055_summary.csv",
	"2021_abc_056_result.csv", "2021_abc_056_summary.csv", "2021_abc_057_result.csv", "2021_abc_057_summary.csv", "2021_abc_058_result.csv",
	"2021_abc_058_summary.csv", "2021_abc_059_result.csv", "2021_abc_059_summary.csv", "2021_abc_060_result.csv", "2021_abc_060_summary.csv",
	"2021_abc_061_result.csv", "2021_abc_061_summary.csv", "2021_abc_062_result.csv", "2021_abc_062_summary.csv", "2021_abc_063_result.csv",
	"2021_abc_063_summary.csv", "2021_abc_064_result.csv", "2021_abc_064_summary.csv", "2021_abc_065_result.csv", "2021_abc_065_summary.csv",
	"2021_abc_067_result.csv"}
);

For( i = 1, i <= N Items( file_lst ), i++,
	print( eval(Regex( file_lst[i], "^([0-9]+)_([a-z]{3})_([0-9]{3}).*result", "Fullname: \0 Part1: \1 Part2: \2 Part3: \3" )) )
);
Georg
gzmorgan0
Super User (Alumni)

Re: Delete strings in a list that contain a word

Here is another approach.  I like to use the RegexMatch(SourceStr, patStr) function.  dot asterisk (.*) is similar to using a % in a Like database function.  Since you wish to remocve items from your list, you need to start from the end of the list otherwise your index is corrupted.

Names Default to Here(1);

FileList = {"2021_abc_027_result.csv", "2021_abc_027_summary.csv", "2021_abc_028_result.csv",
"2021_abc_028_summary.csv", "2021_abc_029_result.csv", "2021_abc_029_summary.csv",
"2021_abc_038_result.csv", "2021_abc_038_summary.csv", "2021_abc_040_result.csv",
"2021_abc_040_summary.csv", "2021_abc_041_result.csv", "2021_abc_041_summary.csv",
"2021_abc_042_result.csv", "2021_abc_042_summary.csv", "2021_abc_043_result.csv",
"2021_abc_043_summary.csv", "2021_abc_044_result.csv", "2021_abc_044_summary.csv",
"2021_abc_045_result.csv", "2021_abc_045_summary.csv", "2021_abc_046_result.csv",
"2021_abc_046_summary.csv", "2021_abc_047_result.csv", "2021_abc_047_summary.csv",
"2021_abc_048_result.csv", "2021_abc_048_summary.csv", "2021_abc_049_result.csv",
"2021_abc_049_summary.csv", "2021_abc_050_result.csv", "2021_abc_050_summary.csv",
"2021_abc_051_result.csv", "2021_abc_051_summary.csv", "2021_abc_052_result.csv",
"2021_abc_052_summary.csv", "2021_abc_053_result.csv", "2021_abc_053_summary.csv",
"2021_abc_054_result.csv", "2021_abc_054_summary.csv", "2021_abc_055_result.csv",
"2021_abc_055_summary.csv", "2021_abc_056_result.csv", "2021_abc_056_summary.csv",
"2021_abc_057_result.csv", "2021_abc_057_summary.csv", "2021_abc_058_result.csv",
"2021_abc_058_summary.csv", "2021_abc_059_result.csv", "2021_abc_059_summary.csv",
"2021_abc_060_result.csv", "2021_abc_060_summary.csv", "2021_abc_061_result.csv",
"2021_abc_061_summary.csv", "2021_abc_062_result.csv", "2021_abc_062_summary.csv",
"2021_abc_063_result.csv", "2021_abc_063_summary.csv", "2021_abc_064_result.csv",
"2021_abc_064_summary.csv", "2021_abc_065_result.csv", "2021_abc_065_summary.csv",
"2021_abc_067_result.csv"};


    patStr = "2021.*result.*";
    For(i = nitems(FileList), i>=1, i--,
        SourceStr = FileList[i] ;
		ret = Try( Regex Match( SourceStr, patStr )[1], "" );
		If( ret != SourceStr, RemoveFrom(FileList,i) );
	); //end For i
	Show(FileList);

 

Craige_Hales
Super User

Re: Delete strings in a list that contain a word

Three high quality solutions:

@Thierry_S - easy to understand and maintain

@Georg - exactly how regex replacement is used

@gzmorgan0  - efficient removal from a list by working backwards

 

You can mix-and-match these ideas to make an ideal solution that is easy to maintain, efficient, and clearly represents the problem you are solving.

Craige