cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
SunnyR
Level III

Extracting words from a line- as column name and its corresponding numbers as values into the column

I have a column named "Fault"- where I have fault codes extracted from various devices. One of the fault code from a particular day is given below.

 

"inverter Inverter 20 below limit: 0.96 Kilowatts/inverter (58.9 % of 1.63 Kilowatts) at 11/30/2019 7:30:00 AM (sun az: 123.4 el: 3.7) Duration: 4.00 hours inverter Inverter 23 below limit: 0.95 Kilowatts/inverter (57.9 % of 1.63 Kilowatts) at 11/30/2019 7:30:00 AM (sun az: 123.4 el: 3.7) Duration: 3.50 hours"

 

The goal is to perform an analysis to tabulate the Faulty inverters  by Date and its cumulative under performance and duration; later use the other values to diagnose the fault.

 

I need to extract the details as follows in the format  ----Column name : Value

Inverter: 20

Below limit (kW/Inverter) : 0.96

% of Max: 58.9%

Date 11/30/2019

Time : 7:30:00 AM

Sun Az : 123.4

Elevation:3.7

Duration (hrs) : 4

 

If there is a second inverter which also has a fault code then ( see second line in the fault code above)

Inverter: 23

Below limit (kW/Inverter) : 0.95

% of Max: 57.9%

Date 11/30/2019

Time : 7:30:00 AM

Sun Az : 123.4

Elevation:3.7

Duration (hrs) : 3.50

 

Sometimes there might be 5 or 10 inverters ( out of 70) which could give similar codes in a single line.

Can you let me know how to get this done. Is it only through JSL? or can I use formulas? Either way- I need some help.

1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User

Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

I had some spare time......not much else to do with Covid-19 still around, so I wrote a bit of JSL that should give you a good start with reading in your data

Names Default To Here( 1 );
dt = Open(
	
	"Path to the data file",
	columns( New Column( "c000001", Character, "Nominal" ) ),
	Import Settings(
		End Of Line( CRLF, CR, LF ),
		End Of Field( CSV( 0 ) ),
		Strip Quotes( 1 ),
		Use Apostrophe as Quotation Mark( 0 ),
		Use Regional Settings( 0 ),
		Scan Whole File( 1 ),
		Treat empty columns as numeric( 0 ),
		CompressNumericColumns( 0 ),
		CompressCharacterColumns( 0 ),
		CompressAllowListCheck( 0 ),
		Labels( 0 ),
		Column Names Start( 1 ),
		Data Starts( 1 ),
		Lines To Read( "All" ),
		Year Rule( "20xx" )
	)
);

// Create the new output data table
dtLong = New Table( "Long Data", 
New Column( "Label", character ), 
New Column( "Value", character ), 
New Column( "Line" ),
New Column( "Instance" )	
);

// A function that returns the next word in the line and
// then removes the word from the front of the target line
getWord = Function( {},
	theWord = Trim( Word( 1, target, " " ) );
	target = Trim( Substr( target, Length( theWord ) + 1 ) );
	theWord;
);


// Loop through all rows in the input data table
For( i = 1, i <= N Rows( dt ), i++,
	instance = 0;
	target = Column( dt, 1 )[i];
	While( Word( 1, target, " " ) != "",
		dtLong << add rows( 1 );
		dtLong:line[N Rows( dtLong )] = i;
		theWord = getWord();
		Match( theWord,
			"inverter",
				instance++;
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:Label[N Rows( dtLong )] = theWord;
				theWord = getWord();
				theWord = getword();
				dtLong:value[N Rows( dtLong )] = theWord;,
			"below",
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:Label[N Rows( dtLong )] = theWord || " limit (KW/Inverter)";
				theWord = getWord();
				theWord = getWord();
				dtLong:value[N Rows( dtLong )] = theWord;
				// Get the % of Max
				dtLong << Add Rows( 1 );
				dtLong:line[N Rows( dtLong )] = i;
				dtLong:Instance[N Rows( dtLong )] = instance;
				theWord = getWord();
				theWord = getWord();
				dtLong:Label[N Rows( dtLong )] = "% of Max";
				dtLong:value[N Rows( dtLong )] = Substr( theWord, 2 );
				// Skip forward to the Date
				For( k = 1, k <= 5, k++,
					theWord = getWord()
				);
				// Get the Date
				dtLong << add rows( 1 );
				theWord = getWord();
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:line[N Rows( dtLong )] = i;
				dtLong:Label[N Rows( dtLong )] = "Date";
				dtLong:value[N Rows( dtLong )] = theWord;
				// Get the Time
				dtLong << add rows( 1 );
				theWord = getWord();
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:line[N Rows( dtLong )] = i;
				dtLong:Label[N Rows( dtLong )] = "Time";
				dtLong:value[N Rows( dtLong )] = theWord || getWord();
				// Get the Sun Az
				dtLong << add rows( 1 );
				theWord = getWord();
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:line[N Rows( dtLong )] = i;
				dtLong:Label[N Rows( dtLong )] = Substr( theWord, 2 ) || " " || getWord();
				dtLong:value[N Rows( dtLong )] = getWord();
				// Get the Elevation
				dtLong << add rows( 1 );
				theWord = getWord();
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:line[N Rows( dtLong )] = i;
				dtLong:Label[N Rows( dtLong )] = "Elevation";
				dtLong:value[N Rows( dtLong )] = Word( -1, getWord(), ")" );,
			"Duration:",
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:Label[N Rows( dtLong )] = theWord || " (hrs)";
				dtLong:value[N Rows( dtLong )] = getWord();
				theWord = getWord();
		)
		
		;
	);
);
Jim

View solution in original post

7 REPLIES 7
David_Burnham
Super User (Alumni)

Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

You should be able to make a lot of progress on this using the formula editor - take a look at the WORD function

-Dave
SunnyR
Level III

Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

Dave,

  Yes, I could if I have just 1 or 2 faulty inverters, But then if for example I have 23 fault codes in a single line for a single date then I need to create so many columns individually and then write up formulas for each of the 23 inverters. Can you clarify if I am understanding your solution.  I also added another line which summarizes the goal of the analysis after extraction.

txnelson
Super User

Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

I have had to handle data like this many times.  The approach that I think has to be taken is to read all of the data from your input file into a JMP data table, where each complete line is read into a single column.  You will end up with a data table with one column, and the number of rows equal to the number of lines in the input data table.

Next, you create a new table with 3 columns, Label, Value and Line.  Then, using JSL that scans the contents of each row in the one column, one creates a new row in the new table, and populates the Label(i.e. Inverter, Below limit(KW/Inverter) etc.), the Value(i.e. 20, 0.96, etc.) and the Line, which would just be the current row number.....Row().  There may also have to be an additional output column that would be based upon this being the 1st Inverter on the line, or the 2nd inverter.....etc.

This can be done using a While() function in combination of the Word() function scanning for ":" as the delimiter.  The odd numbered words would be the Labels, the even numbered will be the Values.

Once this new table is created, you can easily move into a one row per inverter, using Tables=>Split.

Jim
SunnyR
Level III

Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

Jim

 Thank you for your suggestion. I will try. As I am new to scripting-I will go through examples to figure it out.

txnelson
Super User

Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

I had some spare time......not much else to do with Covid-19 still around, so I wrote a bit of JSL that should give you a good start with reading in your data

Names Default To Here( 1 );
dt = Open(
	
	"Path to the data file",
	columns( New Column( "c000001", Character, "Nominal" ) ),
	Import Settings(
		End Of Line( CRLF, CR, LF ),
		End Of Field( CSV( 0 ) ),
		Strip Quotes( 1 ),
		Use Apostrophe as Quotation Mark( 0 ),
		Use Regional Settings( 0 ),
		Scan Whole File( 1 ),
		Treat empty columns as numeric( 0 ),
		CompressNumericColumns( 0 ),
		CompressCharacterColumns( 0 ),
		CompressAllowListCheck( 0 ),
		Labels( 0 ),
		Column Names Start( 1 ),
		Data Starts( 1 ),
		Lines To Read( "All" ),
		Year Rule( "20xx" )
	)
);

// Create the new output data table
dtLong = New Table( "Long Data", 
New Column( "Label", character ), 
New Column( "Value", character ), 
New Column( "Line" ),
New Column( "Instance" )	
);

// A function that returns the next word in the line and
// then removes the word from the front of the target line
getWord = Function( {},
	theWord = Trim( Word( 1, target, " " ) );
	target = Trim( Substr( target, Length( theWord ) + 1 ) );
	theWord;
);


// Loop through all rows in the input data table
For( i = 1, i <= N Rows( dt ), i++,
	instance = 0;
	target = Column( dt, 1 )[i];
	While( Word( 1, target, " " ) != "",
		dtLong << add rows( 1 );
		dtLong:line[N Rows( dtLong )] = i;
		theWord = getWord();
		Match( theWord,
			"inverter",
				instance++;
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:Label[N Rows( dtLong )] = theWord;
				theWord = getWord();
				theWord = getword();
				dtLong:value[N Rows( dtLong )] = theWord;,
			"below",
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:Label[N Rows( dtLong )] = theWord || " limit (KW/Inverter)";
				theWord = getWord();
				theWord = getWord();
				dtLong:value[N Rows( dtLong )] = theWord;
				// Get the % of Max
				dtLong << Add Rows( 1 );
				dtLong:line[N Rows( dtLong )] = i;
				dtLong:Instance[N Rows( dtLong )] = instance;
				theWord = getWord();
				theWord = getWord();
				dtLong:Label[N Rows( dtLong )] = "% of Max";
				dtLong:value[N Rows( dtLong )] = Substr( theWord, 2 );
				// Skip forward to the Date
				For( k = 1, k <= 5, k++,
					theWord = getWord()
				);
				// Get the Date
				dtLong << add rows( 1 );
				theWord = getWord();
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:line[N Rows( dtLong )] = i;
				dtLong:Label[N Rows( dtLong )] = "Date";
				dtLong:value[N Rows( dtLong )] = theWord;
				// Get the Time
				dtLong << add rows( 1 );
				theWord = getWord();
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:line[N Rows( dtLong )] = i;
				dtLong:Label[N Rows( dtLong )] = "Time";
				dtLong:value[N Rows( dtLong )] = theWord || getWord();
				// Get the Sun Az
				dtLong << add rows( 1 );
				theWord = getWord();
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:line[N Rows( dtLong )] = i;
				dtLong:Label[N Rows( dtLong )] = Substr( theWord, 2 ) || " " || getWord();
				dtLong:value[N Rows( dtLong )] = getWord();
				// Get the Elevation
				dtLong << add rows( 1 );
				theWord = getWord();
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:line[N Rows( dtLong )] = i;
				dtLong:Label[N Rows( dtLong )] = "Elevation";
				dtLong:value[N Rows( dtLong )] = Word( -1, getWord(), ")" );,
			"Duration:",
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:Label[N Rows( dtLong )] = theWord || " (hrs)";
				dtLong:value[N Rows( dtLong )] = getWord();
				theWord = getWord();
		)
		
		;
	);
);
Jim
SunnyR
Level III

Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

Amazing! It works.Thank you for your patience in writing this long code to accomplish my request. I am trying to implement this to other Alert codes I have.Thank you so much.It saved me eons.

Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

Hi SunnyR,

 

I know this is an old post, but I found the problem interesting and wanted to share an idea I sometime use in situations similar to yours. It's very powerful (and simple) IF the data is well-structured, and not too complex--yours qualifies.

 

NOTE: this particular solution assumes your logs currently lie in a single column of data, which is the only column of data in the table. As you mention, different rows may contain different numbers of inverter logs.

 

I noticed that the word "hours" occurs only at the end of each inverter's log, so we can break log records this way and stack the data. Once that is done, we'll break up the text to keep the numbers. It is important that all the logs have the same text and measurements, always in the same order--so these will be aligned in columns when we break the text.

 

We'll convert all columns to numeric, which makes non-numeric data obvious, that is, missing. We will delete non-numeric data, rename the columns, and we'll be done.

 

I've attached a table with a table script so you can see how this works.

 

Names Default To Here( 1 );

dt = :log << get data table;   // for use within a table script. otherwise, use dt = current data table();

// can break up the logs at "hours". Replace with ~ or some other unused character to delimit.
For( i = 1, i <= N Row( dt ), i++,				
	dt:log[i] = Substitute( dt:log[i], "hours", "~" )		
);

//split the logs using this delimiter
dt << Text to Columns( delimiters( "~" ), columns( :log ) );

// some logs had more inverters than others, so the stack my have empty data in some rows. Delete them.
dt2 = dt << stack( columns( (dt << get column  names)[2 :: N Col( dt )] ) );
dt2 << delete rows(dt2 << get rows where (:data == "")); 

//using parens and spaces as delimiters, break up each log
dt2 << text to columns( delimiters( " ()" ), columns( :data ) );

//deal with date and time. They're always in the same columns, making this easier
dt2 << New Column( "date", formula( Informat( :data 14 ) ) );
dt2 << New Column( "time", formula( Informat( :data 15 || " " || :data 16 ) ) );
dt2:date << Delete Formula;
dt2:time << Delete Formula;
dt2 << delete columns( {"data 14", "data 15"} );  //else these persist later, as they are numeric

//Remove non-numeric columns by changing all columns to numeric and removing those containing only missing data.
//Proceed "backward" as deletions change indices.
For( i = N Col( dt2 ), i >= 1, i--,
	Column( dt2, i ) << set data type( numeric );
	If( N Row( Loc( dt2:i << get values ) ) == 0, dt2 << delete columns( i ) );
);

//rename columns and reformat date and time.
colNames = Words( "Inverter,Below Limit(kw/inverter),% of max,Max,Sun Az,Elevation,Duration", "," );
For( i = 1, i <= N Items( colNames ), i++, Column( dt2, i ) << set name( colNames[i] ) );
dt2:date << Format( "m/d/y" );
dt2:time << Format( "h:m:s" );