Subscribe Bookmark RSS Feed

Help w/ parsing ambiguous .dat file? convert units, add labels?

aliegner

Community Trekker

Joined:

Nov 13, 2015

So the previous engineer left & I'm looking through some of his scripts he wrote to help us with .dat files.  He's got one that parses and formats data out into a data table, but I think it needs a little more work to be more user friendly.

The .dat file is a tab-delineated data set with an ambigious amount of measurements.  I mean that the amount and label for each measurement in the .dat changes from product to product.  Like one set of files might have S1 through S5, others might be S11 to S26, etc.  Basically, the labels and amount of labels change w/ each data set.  It looks like the previous script accounts for all this.

How would I adjust it at the end to then convert the units and add "label (mm)" in the column label?  Current measurements is all in mils (1/1000 inch).

advise?  I'm new to .jsl and trying to figure this all out.

I attached some white washed copies if that helps.

10710_parser help.png

1 ACCEPTED SOLUTION

Accepted Solutions
Solution

While your former coworker's code looks good, its commented poorly.  Rather than waste my time trying to figure it out, I just added the functionality your looking for on the end.  Just stick the below code to the end.

It's a better programming method anyways since it's always best to keep functional programming separate.  One code to parse, one code to convert, ect.  Keeps the code clean, easier to debug, refactor, read, ect.

This code (1) finds all columns that start with "S"

(2) adds a new column converting the previous one to inch and attaching the "(inch)" label (could easily change to convert to (mm).

(3) deletes the mils columns.  You can just remove the last two lines of code if you want to keep them.

//Added Portion

//Find S columns

S_columns = {};

for(i=1, i<=ncols(PEMdata_all), i++,

       colname = column(PEMdata_all, i) << Get Name;

       If(!ismissing(regex(colname, "S.*")),

              insertinto(S_columns,colname)

       )

);

//Create new columns wtih conversions

for(i=1, i<=nitems(S_columns), i++,

       newcol = PEMdata_all << New Column(S_columns[i] || " (inch)", Numeric, Continuous,
          Formula(
Column(PEMdata_all, S_columns[i])[] * 1000));

       //Optionally delete previous mils columns

      newcol << Delete Formula;

      PEMdata_all << Delete Columns( Column(PEMdata_all, S_columns[i]) );

);

6 REPLIES
Solution

While your former coworker's code looks good, its commented poorly.  Rather than waste my time trying to figure it out, I just added the functionality your looking for on the end.  Just stick the below code to the end.

It's a better programming method anyways since it's always best to keep functional programming separate.  One code to parse, one code to convert, ect.  Keeps the code clean, easier to debug, refactor, read, ect.

This code (1) finds all columns that start with "S"

(2) adds a new column converting the previous one to inch and attaching the "(inch)" label (could easily change to convert to (mm).

(3) deletes the mils columns.  You can just remove the last two lines of code if you want to keep them.

//Added Portion

//Find S columns

S_columns = {};

for(i=1, i<=ncols(PEMdata_all), i++,

       colname = column(PEMdata_all, i) << Get Name;

       If(!ismissing(regex(colname, "S.*")),

              insertinto(S_columns,colname)

       )

);

//Create new columns wtih conversions

for(i=1, i<=nitems(S_columns), i++,

       newcol = PEMdata_all << New Column(S_columns[i] || " (inch)", Numeric, Continuous,
          Formula(
Column(PEMdata_all, S_columns[i])[] * 1000));

       //Optionally delete previous mils columns

      newcol << Delete Formula;

      PEMdata_all << Delete Columns( Column(PEMdata_all, S_columns[i]) );

);

aliegner

Community Trekker

Joined:

Nov 13, 2015

Awesome!  This works exactly what I was looking for.  I added a repeated section to convert the "C" columns.  Cheers!!

However, the only problem left is that its picking up one of the other 'name' columns (Carrier #).  What would the line of script be to search for "C.*" but exclude "Carrier #"?

msharp

Super User

Joined:

Jul 28, 2015

In the red text above you mentioned you only wanted the S columns.  Guess that was a misunderstanding on my part.

Options: Assuming you want to make the same conversion you could simply run through all columns after the first four label columns.

conv_columns = {};

for(i=5, i<=ncols(PEMdata_all), i++,

     colname = column(PEMdata_all, i) << Get Name;

     insertinto(conv_columns,colname)

);

Or you could change the regex:
"CY.*" Looks like it would work assuming all C columns of interest actually start with CY. 

"\w+\d\d" Would match at least one or more word character followed immediately by at least two number characters.  This would match both 'CY22' and 'S12' for example, but not 'Carrier'


I'm not going to go into too much detail but there are many ways to parse strings. Regular expressions is a subject that can and has filled many many books.  That being said the basics can be learned quickly and taking an hour to watch a youtube video or read the Regex section of the JSL Scripting Guide would be worth your time if you plan to do any amount of coding.  Regular expressions are used in many programming languages, so the knowledge is transferable.

aliegner

Community Trekker

Joined:

Nov 13, 2015

Thanks msharp,

Yea, all the S's go together, the CY's, the T's, the CFW's, etc.  I just ended up repeating the structure for each type.  Maybe not the most elegant, but it works.

Any tips for how to then add more columns that compute the max of each group?  Like a new column for the max of the CY's for each row?

From there, I could try to program it to pop up a new summary window showing the measurement in Column 1 (CY, T, S, CFW), then the count, mean, USL, and CPK, hopefully.

Any tips?

ian_jmp

Staff

Joined:

Jun 23, 2011

Here's an example to give you some ideas:

NamesDefaultToHere(1);

// Make a table with some 10 columns

v = J(100, 10, RandomNormal());

dt = AsTable(v);

// Make a new column in dt that contains the maximum value of just columns 4, 5 and 6

v = :Col4 << getValues;

v = v || (:Col5 << getValues);

v = v || (:Col6 << getValues);

vMax = Transpose(VMax(Transpose(v)));

dt << NewColumn("Max of 3, 4 and 5", Numeric, Continuous, Values(vMax));

msharp

Super User

Joined:

Jul 28, 2015

First off, one should never repeat code.  If you are going to do something multiple times this is a good signal to create a function.  I also included a snippit of Ian's code to help you out with the Max thing as well.

I didn't debug this before posting, but this should work nicely for you.

//Create Column Conversion Function

convertColumns = Function({PEMdata_all, Letter, Unit, Conversion},

  //Find "X" columns

  S_columns = {};

  v = {};

   for(i=1, i<=ncols(PEMdata_all), i++,

    colname = column(PEMdata_all, i) << Get Name;

       If(!ismissing(regex(colname, Letter || ".*")),

            insertinto(S_columns,colname);

       v = v || colname << Get Values;

  ));

   //Add VMax Column

  vMax = Transpose(VMax(Transpose(v)));

  PEMdata_all << New Column(Letter || " Max", Numeric, Continuous, Values(vMax));

    insert into(S_columns,Letter || " Max"); //add max column so it also converts

    //Create new columns wtih conversions

    for(i=1, i<=nitems(S_columns), i++,

      newcol = PEMdata_all << New Column(S_columns[i] || " ("||Unit||")", Numeric, Continuous,

      Formula(Column(PEMdata_all, S_columns[i])[] * Conversion));

          //Optionally delete previous mils columns

        newcol << Delete Formula;

        PEMdata_all << Delete Columns( Column(PEMdata_all, S_columns[i]) );

  );

);

//Run through Columns of interest

Letters = {"S", "CY"};

Unit = "inch";

Conversion = 1000;

For(i=1,i<=nitems(Letters),i++,

       convertColumns(PEMdata_all,Letters[i],Unit,Conversion);

);