cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
tarkan_bih
Level III

Regex on files in a directory

Hi,

 

I am trying to generate a script that goes opens a file based on a regex expression (file starts with "dat"). My code is listed below and it does not appear to be working. Is there something special that needs to be done when using regex on a list?

 

dir = pick directory();
files = files in directory(dir);

datafile = regex(files, "(^dat.*)","\1");
open(dir||datafile);

1 ACCEPTED SOLUTION

Accepted Solutions
ih
Super User (Alumni) ih
Super User (Alumni)

Re: Regex on files in a directory

Here is one way to implement @cwillden's solution with regex():

 

Names default to here( 1 );
dir = pick directory();
files = files in directory(dir);

for(i=1, i <= N Items(files), i++,
	If(!Is missing(regex(files[i],"^dat.*")), //if first 3 letters of file name are "dat"
		open(dir||files[i])
	)
);

View solution in original post

4 REPLIES 4
cwillden
Super User (Alumni)

Re: Regex on files in a directory

I have always found regular expressions frustrating, so I avoid them as much as possible :).  I could definitely be wrong, but I don't think they are mean to be applied across all items in a list.  Easiest solution would probably be to loop through your files and throw out any that don't start with "dat." using something like this:

for(i=1, i <= N Items(files), i++,
	If(Substr(files[i],1,3)=="dat", //if first 3 letters of file name are "dat"
		open(dir||files[i])
	)
);
-- Cameron Willden
ih
Super User (Alumni) ih
Super User (Alumni)

Re: Regex on files in a directory

Here is one way to implement @cwillden's solution with regex():

 

Names default to here( 1 );
dir = pick directory();
files = files in directory(dir);

for(i=1, i <= N Items(files), i++,
	If(!Is missing(regex(files[i],"^dat.*")), //if first 3 letters of file name are "dat"
		open(dir||files[i])
	)
);

Re: Regex on files in a directory

If this case is as simple as stated (all target file names begin with "dat") then either method works and both are easy to use. (One method to find a matching filename is based on character functions and the other method is based on regular expressions.) If, in fact, the case is not simple or becomes more complex over time, the first approach might become more difficult or even impossible. That case is where regex would be a superior approach.

(Note that JMP also defines 'patterns' of text with a full compliment of functions to make patterns and use patterns.)

I don't think that regular expressions are frustrating. The syntax of regex is simple enough. I think that using regex or any similar method to find target patterns in real-world text is frustrating. But that is because of the mess in the unstructured data rather than the tool. If you must analyze text, then knowledge and skill with regex is valuable.

Re: Regex on files in a directory

Adding to the previous solutions, you probably want to work with the data tables after importing them with the Open() function. I would capture the data table references in another list. I will use the second solution as an example but obviously the same thing would work with the first solution as well.

 

Names Default to Here( 1 );
dir = Pick Directory();
files = Files in Directory( dir );
data files = List();

For( i = 1, i <= N Items( files ), i++,
	If( !Is Missing( Regex( files[i], "^dat.*" ) ),
		ref = Open(dir||files[i]);
		Insert Into( data files, ref );
	);
);