cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
nathan-clark
Level VI

import csv and only change format of few columns

I have some csv files that need to be imported and there are a few fields that correlate to software versions. However they are in the csv like x.y.z and are not encapsulated in quotes so JMP tries to bring it in as a numeric data type. When it does so, JMP believes the number is a date and converts it to a date so it's very difficult to convert back to a usable version.

 

If I manually import a file, I can use the preview to force the columns I need to the data types I want. However, the resulting import script has 'new column()' for every single column, and I either have to omit the entire thing so JMP just imports the file OR have all 186 columns listed just so I can control the import of a few.

Where the real issue comes in is that some of these files have different columns ... some are in common, some are not, but in all cases the version columns are in common AND I need to control how they are brought in.

 

Below is what I would like to be able to do (or something like this) but the total number of columns is really 186...

Open(
	filepath,
	columns(
		New Column( "ALGORITHM_VERSION", Character, "Nominal" ),
		New Column( "CLUST_ALG_VERSION", Character, "Nominal" ),
		New Column( "CYTOFILEIO_VERSION", Character, "Nominal" ),
		New Column( "RELEASE_VERSION", Character, "Nominal" )
	),
	Import Settings(
		End Of Line( CRLF, CR, LF ),
		End Of Field( Comma, CSV( 0 ) ),
		Strip Quotes( 1 ),
		Use Apostrophe as Quotation Mark( 0 ),
		Use Regional Settings( 0 ),
		Scan Whole File( 1 ),
		Treat empty columns as numeric( 0 ),
		CompressNumericColumns( 0 ),
		CompressCharacterColumns( 0 ),
		CompressAllowListCheck( 0 ),
		Labels( 1 ),
		Column Names Start( 1 ),
		Data Starts( 2 ),
		Lines To Read( "All" ),
		Year Rule( "20xx" )
	)
)
1 ACCEPTED SOLUTION

Accepted Solutions
nathan-clark
Level VI

Re: import csv and only change format of few columns

Yes, unfortunately, I won't have tons of information on the organization. However, I checked through JMP support as well, and found a solution.

 

Since the Open() command tries to be intelligent and get the right format for each column, the suggestion was to do Multiple File Import(). Since it's focus is speed it doesn't take the time to identify data types as much, so the format x.y.z looks like character at first blush so it works! The irony is that I was burned by this a while back and had to add code to a script if MFI was used to correct some of the things that Open() correctly identified.

In this instance, that "flaw" was a benefit. I'm not sure if it will be the 100% always do what I need solution, but it fixed the problem easily here. A shout out to Wendy Murphrey who gave me the idea.

View solution in original post

5 REPLIES 5
txnelson
Super User

Re: import csv and only change format of few columns

My error, this will not correct columns that have a structure of x.y.z.

Here is a script that should work to properly change the columns to character.

Names Default To Here( 1 );
dt = Open(
	filepath, 
	
	Import Settings(
		End Of Line( CRLF, CR, LF ),
		End Of Field( Comma, CSV( 0 ) ),
		Strip Quotes( 1 ),
		Use Apostrophe as Quotation Mark( 0 ),
		Use Regional Settings( 0 ),
		Scan Whole File( 1 ),
		Treat empty columns as numeric( 0 ),
		CompressNumericColumns( 0 ),
		CompressCharacterColumns( 0 ),
		CompressAllowListCheck( 0 ),
		Labels( 1 ),
		Column Names Start( 1 ),
		Data Starts( 2 ),
		Lines To Read( "All" ),
		Year Rule( "20xx" )
	)
);

For( i = 1, i <= N Cols( dt ), i++,
	If(
		Contains(
			Uppercase( Column( dt, i ) << get name ),
			"_VERSION"
		) & Column( dt, i ) << get data type == "Numeric",
		Column( dt, i ) << set format( "Fixed Dec", 18, 0 );
		Column( dt, i ) << data type( "Character" );
	)
);

 

Jim

Re: import csv and only change format of few columns

Could you use a formula like

Char( Month( :colName ) ) || "." || Char( Day( :colName ) ) || "." ||
Char( Year( :colName ) )

to convert back?

nathan-clark
Level VI

Re: import csv and only change format of few columns

Looking at some of the layout, that would work for some of them, but not all of them. and, of course, i won't know the layout i'm getting until it gets processed.

 

Right now we have code for each avenue of CSV and we just need to send the files down the right path to get handled ... but since it should be like 90% the same for all of them, it seems to be a shame there isn't a more dynamic solution.

Re: import csv and only change format of few columns

There are options. If I only knew the patterns for the version numbers and nothing else (# of columns, names, data types, etc.) you could use Load Text File, pass through the data once to find where they were, then build the import code accordingly. Or, you could read all the columns as text then check each of them for possible version numbers (converting to numeric when warranted). Either approach could be slow for very large files, but robust - you won't lose information. The more you knew about the possible location of the version number, the more efficient either of these approaches could be made.

nathan-clark
Level VI

Re: import csv and only change format of few columns

Yes, unfortunately, I won't have tons of information on the organization. However, I checked through JMP support as well, and found a solution.

 

Since the Open() command tries to be intelligent and get the right format for each column, the suggestion was to do Multiple File Import(). Since it's focus is speed it doesn't take the time to identify data types as much, so the format x.y.z looks like character at first blush so it works! The irony is that I was burned by this a while back and had to add code to a script if MFI was used to correct some of the things that Open() correctly identified.

In this instance, that "flaw" was a benefit. I'm not sure if it will be the 100% always do what I need solution, but it fixed the problem easily here. A shout out to Wendy Murphrey who gave me the idea.