I have txt file. I would like to import to JMP to analysis. But some columns are not line up. How do I line up them?
The delimited should be space, spaces and comma. But ", " will treat space as text. How do I overcome this problem?
dtTlog = Open(
D,
Import Settings(
End Of Line( CRLF, CR, LF ),
End Of Field( Spaces, Space, CSV( 1 ), Comma ),
Strip Quotes( 1 ),
Use Apostrophe as Quotation Mark( 0 ),
Use Regional Settings( 0 ),
Scan Whole File( 1 ),
Treat empty columns as numeric( 0 ),
CompressNumericColumns( 0 ),
CompressCharacterColumns( 0 ),
CompressAllowListCheck( 0 ),
Labels( 1 ),
Column Names Start( 1 ),
Data Starts( 2 ),
Lines To Read( "All" ),
Year Rule( "20xx" )
)
);
The issue you are having is that you are using spaces and commas for delimiters, and in column f you have rows that have both a comma and a space delimiter side by side. An example is
4, 7
The column C1c reads in the 4, it then sees the , as a delimiter followed by a space as a second delimiter so since it did not see a value after the first delimiter and the second delimiter, it applies a missing value to the column f, and the 7 then becomes the value for the column C2c.
Below is a script that reads in all of the data, removes all commas, saves it to a temp area and then reads it back in using your input stream
names default to here(1);
d= "<your path and file name>";
dt = Open(
D ,
columns( New Column( "c000001", Character, "Nominal" ) ),
Import Settings(
End Of Line( CRLF, CR, LF ),
End Of Field( CSV( 0 ) ),
Treat Leading Zeros as Character( 1 ),
Strip Quotes( 1 ),
Use Apostrophe as Quotation Mark( 0 ),
Use Regional Settings( 0 ),
Scan Whole File( 1 ),
Treat empty columns as numeric( 0 ),
CompressNumericColumns( 0 ),
CompressCharacterColumns( 0 ),
CompressAllowListCheck( 0 ),
Labels( 0 ),
Column Names Start( 0 ),
First Named Column( 0 ),
Data Starts( 1 ),
Lines To Read( "All" ),
Year Rule( "20xx" )
)
);
for each row(:c000001 = Trim(substitute(:c000001,","," ")));
dt << save ("$TEMP\interium file.txt");
dtTlog = Open(
"$TEMP\interium file.txt",
Import Settings(
End Of Line( CRLF, CR, LF ),
End Of Field( Spaces, Space, CSV( 1 ), Comma ),
Strip Quotes( 1 ),
Use Apostrophe as Quotation Mark( 0 ),
Use Regional Settings( 0 ),
Scan Whole File( 1 ),
Treat empty columns as numeric( 0 ),
CompressNumericColumns( 0 ),
CompressCharacterColumns( 0 ),
CompressAllowListCheck( 0 ),
Labels( 1 ),
Column Names Start( 2 ),
Data Starts( 3 ),
Lines To Read( "All" ),
Year Rule( "20xx" )
)
);
The issue you are having is that you are using spaces and commas for delimiters, and in column f you have rows that have both a comma and a space delimiter side by side. An example is
4, 7
The column C1c reads in the 4, it then sees the , as a delimiter followed by a space as a second delimiter so since it did not see a value after the first delimiter and the second delimiter, it applies a missing value to the column f, and the 7 then becomes the value for the column C2c.
Below is a script that reads in all of the data, removes all commas, saves it to a temp area and then reads it back in using your input stream
names default to here(1);
d= "<your path and file name>";
dt = Open(
D ,
columns( New Column( "c000001", Character, "Nominal" ) ),
Import Settings(
End Of Line( CRLF, CR, LF ),
End Of Field( CSV( 0 ) ),
Treat Leading Zeros as Character( 1 ),
Strip Quotes( 1 ),
Use Apostrophe as Quotation Mark( 0 ),
Use Regional Settings( 0 ),
Scan Whole File( 1 ),
Treat empty columns as numeric( 0 ),
CompressNumericColumns( 0 ),
CompressCharacterColumns( 0 ),
CompressAllowListCheck( 0 ),
Labels( 0 ),
Column Names Start( 0 ),
First Named Column( 0 ),
Data Starts( 1 ),
Lines To Read( "All" ),
Year Rule( "20xx" )
)
);
for each row(:c000001 = Trim(substitute(:c000001,","," ")));
dt << save ("$TEMP\interium file.txt");
dtTlog = Open(
"$TEMP\interium file.txt",
Import Settings(
End Of Line( CRLF, CR, LF ),
End Of Field( Spaces, Space, CSV( 1 ), Comma ),
Strip Quotes( 1 ),
Use Apostrophe as Quotation Mark( 0 ),
Use Regional Settings( 0 ),
Scan Whole File( 1 ),
Treat empty columns as numeric( 0 ),
CompressNumericColumns( 0 ),
CompressCharacterColumns( 0 ),
CompressAllowListCheck( 0 ),
Labels( 1 ),
Column Names Start( 2 ),
Data Starts( 3 ),
Lines To Read( "All" ),
Year Rule( "20xx" )
)
);
Okay, I see. I understand your suggestion. But it give me above alert.
For Each( {col},
dt << Get Column Reference( dt << Get Column Names( Character ) ),
If( !Is Empty( col << Get Formula ),
Continue()
);
For Each Row( dt, col[] = Substitute( col[], ",", " ", <<IGNORECASE ) );
);
I replace that line with these code. It's working now. Thanks.
I had a slight error in my original code. I have gone back and corrected it. Please copy it and verify that it works. The fix you put in place was good, however, the original code does not read in each column separately. It reads in all of the data in each row as one column, and then deals with it from the standpoint of having a data table with just one column.
Alternatively, one could use Load Text File to open the file as text, remove the commas via substitute and finally Open the text into a data table
original posts:
.csv Import error / @Craige_Hales
File Import: specify NAN? / @jthi
txtfile = Load Text File("$DOWNLOADS/log - question2.txt");
substituteinto(txtfile,","," ");
Open(
Char to Blob(txtfile),
Import Settings(
End Of Field( Spaces, Space ),
)
);
Text To Columns: handle missing data correctly
"correctly"?
for your task, text To Columns works like a charm:
dt = Open(
"$DOWNLOADS/log - question2.txt",
Import Settings(
End Of Field( Tab ),
Labels( 0 )
)
);
// split columns via Text to columns
dt << Text to Columns( columns( :c000001 ), Delimiters( " ", "," ));
dt << Delete Columns( :c000001);
// Move column names up
dt << Move up;