cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Discussions

Solve problems, and share tips and tricks with other JMP users.
%3CLINGO-SUB%20id%3D%22lingo-sub-762060%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3EJMP18%3A%20Python%3A%20Konvertierung%20von%20Pandas%20in%20DataTable%20%C3%BCberspringt%20Spalten%20mit%20fehlenden%20Werten%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-762060%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CPRE%3E%3CCODE%20class%3D%22%20language-jsl%22%3Eimport%20jmp%2C%20numpy%20as%20np%2C%20pandas%20as%20pd%0Aimport%20jmputils%0A%0Adef%20p2j(df)%3A%0A%09dt2%20%3D%20jmp.DataTable('dt2'%2C%20df.shape%5B0%5D)%0A%09%23%20get%20the%20column%20names%20from%20the%20data%20frame%0A%09names%20%3D%20list(df.columns)%0A%09%23%20loop%20across%20the%20columns%20of%20the%20data%20frame%0A%09for%20j%20in%20range(%20df.shape%5B1%5D)%3A%0A%09%09%23%20check%20if%20the%20coulumn%20data%20type%20is%20string%20or%20numeric%0A%09%09if%20is_string_dtype(df%5Bnames%5Bj%5D%20%5D)%3A%0A%09%09%09dt2.new_column(names%5Bj%5D%2C%20jmp.DataType.Character)%0A%09%09else%3A%0A%09%09%09dt2.new_column(names%5Bj%5D%2C%20jmp.DataType.Numeric)%0A%09%09%23%20populate%20the%20JMP%20column%20with%20data%0A%09%09dt2%5Bj%5D%20%3D%20list(df.iloc%5B%3A%2Cj%5D)%0A%09return(dt2)%0A%0Afname%20%3D%20r%22export.csv%22%3B%0Adf%20%3D%20pd.read_csv(fname)%0Afilt_csv%20%3D%20p2j(df)%3B%3C%2FCODE%3E%3C%2FPRE%3E%3CP%3EVerwenden%20Sie%20den%20obigen%20Code%2C%20um%20eine%20CSV-Datei%20einzulesen.%20Die%20p2j-Funktion%20scheint%20col3%20nicht%20korrekt%20zu%20importieren%2C%20da%20Werte%20fehlen%3C%2FP%3E%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20image-alt%3D%22minion_0-1717096857056.png%22%20style%3D%22width%3A%20400px%3B%22%3E%3Cspan%20class%3D%22lia-inline-image-display-wrapper%22%20image-alt%3D%22minion_0-1717096857056.png%22%20style%3D%22width%3A%20286px%3B%22%3E%3Cimg%20src%3D%22https%3A%2F%2Fcommunity.jmp.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F64646iEC1656B3AE21CCFD%2Fimage-size%2Fmedium%3Fv%3Dv2%26amp%3Bpx%3D400%22%20role%3D%22button%22%20title%3D%22minion_0-1717096857056.png%22%20alt%3D%22minion_0-1717096857056.png%22%20%2F%3E%3C%2Fspan%3E%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%20%3C%2FP%3E%3CP%3EInhalt%20der%20CSV-Datei%3C%2FP%3E%3CP%3E%22Spalte1%22%2C%20%22Spalte2%22%2C%20%22Spalte3%22%3CBR%20%2F%3E%20%22ABC%22%3CBR%20%2F%3E%20%22a%22%2C%22b%22%2C%22%22%3CBR%20%2F%3E%20%22ein%22%2C%20%22b%22%2C%20%22e%22%3CBR%20%2F%3E%20%22a%22%2C%22b%22%2C%22%22%3C%2FP%3E%3CP%3E%20%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-762060%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CLINGO-LABEL%3EAutomatisierung%20und%20Skripting%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-762069%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3EBetreff%3A%20JMP18%3A%20Python%3A%20Konvertierung%20von%20Pandas%20in%20DataTable%20%C3%BCberspringt%20Spalten%20mit%20fehlenden%20Werten%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-762069%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CP%3EDie%20Verwendung%20des%20obigen%20p2j(%20)%20und%20einer%20aktualisierten%20Datei%20export.csv%20mit%20einer%20Spalte%20numerischer%20Daten%2C%20die%20als%20ersten%20Wert%20einen%20fehlenden%20Wert%20hat%2C%20ergibt%20Folgendes%3A%3C%2FP%3E%0A%3CP%3E%20%3C%2FP%3E%0A%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20image-alt%3D%22Paul_Nelson_0-1717118219358.png%22%20style%3D%22width%3A%20400px%3B%22%3E%3Cspan%20class%3D%22lia-inline-image-display-wrapper%22%20image-alt%3D%22Paul_Nelson_0-1717118219358.png%22%20style%3D%22width%3A%20400px%3B%22%3E%3Cimg%20src%3D%22https%3A%2F%2Fcommunity.jmp.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F64647i688228392B6A53E0%2Fimage-size%2Fmedium%3Fv%3Dv2%26amp%3Bpx%3D400%22%20role%3D%22button%22%20title%3D%22Paul_Nelson_0-1717118219358.png%22%20alt%3D%22Paul_Nelson_0-1717118219358.png%22%20%2F%3E%3C%2Fspan%3E%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%3CSPAN%3E%25%20mehr%20~%2FDownloads%2Fexport.csv%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%3CSPAN%3E%22Spalte1%22%2C%20%22Spalte2%22%2C%20%22Spalte3%22%2C%20%22Spalte4%22%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%3CSPAN%3E%22ABC%22%2C%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%3CSPAN%3E%22a%22%2C%22b%22%2C%22%22%2C2.718%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%3CSPAN%3E%22a%22%2C%22b%22%2C%22e%22%2C1.618%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%3CSPAN%3E%22a%22%2C%22b%22%2C%22%22%2C3.14159%3C%2FSPAN%3E%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-762067%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3EBetreff%3A%20JMP18%3A%20Python%3A%20Konvertierung%20von%20Pandas%20in%20DataTable%20%C3%BCberspringt%20Spalten%20mit%20fehlenden%20Werten%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-762067%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CP%3EHier%20wird%20der%20aktualisierte%20Code%20angezeigt.%20Wenn%20der%20Spaltentyp%20nicht%20numerisch%20oder%20eine%20Zeichenfolge%20ist%2C%20wird%20gepr%C3%BCft%2C%20bis%20wir%20einen%20nicht%20fehlenden%20Wert%20erhalten%2C%20bevor%20der%20Spaltentyp%20als%20Zeichen%20oder%20numerisch%20bestimmt%20wird.%3C%2FP%3E%0A%3CP%3E%20%3C%2FP%3E%0A%3CPRE%3Eimport%20jmp%0Aimport%20numpy%20as%20np%2C%20pandas%20as%20pd%0Afrom%20pandas.api.types%20import%20is_string_dtype%0Afrom%20pandas.api.types%20import%20is_numeric_dtype%0A%0Adef%20p2j(df)%3A%0A%09dt2%20%3D%20jmp.DataTable('dt2'%2C%20df.shape%5B0%5D)%0A%09%23%20get%20the%20column%20names%20from%20the%20data%20frame%0A%09names%20%3D%20list(df.columns)%0A%09%23%20loop%20across%20the%20columns%20of%20the%20data%20frame%0A%09for%20j%20in%20range(%20df.shape%5B1%5D)%3A%0A%09%09%23%20check%20if%20the%20coulumn%20data%20type%20is%20string%20or%20numeric%0A%09%09if%20is_numeric_dtype(%20df%5B%20names%5Bj%5D%20%5D%20)%3A%0A%09%09%09dt2.new_column(names%5Bj%5D%2C%20jmp.DataType.Numeric)%0A%09%09%09dt2%5Bj%5D%20%3D%20list(df.iloc%5B%3A%2Cj%5D)%09%09%09%23%20populate%20the%20JMP%20column%20with%20data%0A%09%09elif%20is_string_dtype(%20df%5B%20names%5Bj%5D%5D%20)%3A%0A%09%09%09dt2.new_column(names%5Bj%5D%2C%20jmp.DataType.Character)%0A%09%09%09dt2%5Bj%5D%20%3D%20list(df.iloc%5B%3A%2Cj%5D)%09%09%09%23%20populate%20the%20JMP%20column%20with%20data%0A%09%09else%3A%0A%09%09%09col%20%3D%20list()%0A%09%09%09dtype%20%3D%20None%0A%09%09%09for%20i%20in%20range(dt2.nrows)%3A%0A%09%09%09%09cell_value%20%3D%20df.iloc%5Bi%2Cj%5D%0A%09%09%09%09%23%20delay%20setting%20column%20type%20until%20we%20see%20first%20non-null%20value%0A%09%09%09%09if%20pd.isna(%20cell_value%20)%3A%0A%09%09%09%09%09col.append(None)%0A%09%09%09%09else%3A%0A%09%09%09%09%09if%20not%20dtype%3A%0A%09%09%09%09%09%09if%20is_numeric_dtype(%20cell_value%20)%3A%0A%09%09%09%09%09%09%09dtype%20%3D%20jmp.DataType.Numeric%0A%09%09%09%09%09%09else%3A%0A%09%09%09%09%09%09%09%23%20assume%20it's%20character%20-%20could%20be%20datetime64%2C...%20%0A%09%09%09%09%09%09%09dtype%20%3D%20jmp.DataType.Character%0A%09%09%09%09%09col.append(%20cell_value%20)%0A%09%09%09print(col)%0A%09%09%09if%20dtype%3A%20%0A%09%09%09%09dt2.new_column(names%5Bj%5D%2C%20dtype)%0A%09%09%09%09dt2%5Bj%5D%20%3D%20col%09%09%09%09%23%20populate%20the%20JMP%20column%20with%20data%0A%09%09%0A%09return(dt2)%0A%0Afname%20%3D%20r%22export.csv%22%3B%0Adf%20%3D%20pd.read_csv(jmp.HOME%20%2B%20'Downloads%2F'%20%2B%20fname)%0Aprint(df)%0Aprint(df.dtypes)%0Aprint(%20f'col1%20is%20a%20string%3A%20%7Bis_string_dtype(df%5Bnames%5B0%5D%5D)%7D'%20)%0Aprint(%20f'col2%20is%20a%20string%3A%20%7Bis_string_dtype(df%5Bnames%5B1%5D%5D)%7D'%20)%0Aprint(%20f'col3%20is%20a%20string%3A%20%7Bis_string_dtype(df%5Bnames%5B2%5D%5D)%7D'%20)%0A%0Afilt_csv%20%3D%20p2j(df)%3B%3C%2FPRE%3E%0A%3CP%3EDies%20f%C3%BChrt%20zu%20den%20Ergebnissen%3A%3C%2FP%3E%0A%3CP%3E%20%3C%2FP%3E%0A%3CPRE%3E%2F*%3A%0A%0A%20%20col1%20col2%20col3%0A0%20%20%20%20a%20%20%20%20b%20%20%20%20c%0A1%20%20%20%20a%20%20%20%20b%20%20NaN%0A2%20%20%20%20a%20%20%20%20b%20%20%20%20e%0A3%20%20%20%20a%20%20%20%20b%20%20NaN%0A%0A%0Acol1%20%20%20%20object%0Acol2%20%20%20%20object%0Acol3%20%20%20%20object%0Adtype%3A%20object%0A%0A%0Acol1%20is%20a%20string%3A%20True%0A%0A%0Acol2%20is%20a%20string%3A%20True%0A%0A%0Acol3%20is%20a%20string%3A%20False%0A%0A%0A%5B'c'%2C%20None%2C%20'e'%2C%20None%5D%0A%3C%2FPRE%3E%0A%3CP%3EBeachten%20Sie%2C%20dass%20ich%20in%20meiner%20Funktion%20das%20numerische%20NaN%20in%20None%20konvertiert%20habe.%20Der%20JMP%20DataTable-Code%20akzeptiert%20None%20sowohl%20in%20numerischen%20als%20auch%20in%20Zeichenspalten%20und%20macht%20das%20%E2%80%9ERichtige%E2%80%9C.%20Indem%20er%20ein%20NaN%20in%20eine%20numerische%20Spalte%20und%20eine%20leere%20Zeichenfolge%20%E2%80%9E%E2%80%9C%20in%20eine%20Zeichenspalte%20einf%C3%BCgt.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-762062%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3EBetreff%3A%20JMP18%3A%20Python%3A%20Konvertierung%20von%20Pandas%20in%20DataTable%20%C3%BCberspringt%20Spalten%20mit%20fehlenden%20Werten%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-762062%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CP%3EDie%20kurze%20Antwort%20ist%2C%20dass%20%3CFONT%20face%3D%22inherit%22%3Eis_string_dtype(%20col3%20)%20False%20zur%C3%BCckgibt.%20Es%20ist%20keine%20Zeichenfolgenspalte%2C%20sondern%20eine%20Spalte%20mit%20Objekten%2C%20einige%20sind%20Zeichenfolgen%2C%20einige%20sind%20NaN.%20Der%20obige%20Code%20behandelt%20alles%2C%20was%20keine%20Zeichenfolge%20ist%2C%20als%20numerisch.%20Die%20NaNs%20werden%20ordnungsgem%C3%A4%C3%9F%20als%20fehlend%20eingetragen%2C%20aber%20die%20Zeichenwerte%20%E2%80%9Ec%E2%80%9C%20und%20%E2%80%9Ee%E2%80%9C%20sind%20f%C3%BCr%20eine%20numerische%20Spalte%20ung%C3%BCltig%20und%20werden%20zu%20fehlenden%20Werten.%20Ich%20suche%20nach%20einer%20besseren%20Antwort%20zum%20Lesen%20der%20Pandas-Spalte%20als%20Zeichen%2C%20ohne%3C%2FFONT%3E%20%3CFONT%20face%3D%22inherit%22%3Ejeden%20Wert%20zu%20testen.%3C%2FFONT%3E%3C%2FP%3E%3C%2FLINGO-BODY%3E
Choose Language Hide Translation Bar
minion
Level II

JMP18: python: conversion from pandas to dataTable skips columns with missing values

import jmp, numpy as np, pandas as pd
import jmputils

def p2j(df):
	dt2 = jmp.DataTable('dt2', df.shape[0])
	# get the column names from the data frame
	names = list(df.columns)
	# loop across the columns of the data frame
	for j in range( df.shape[1]):
		# check if the coulumn data type is string or numeric
		if is_string_dtype(df[names[j] ]):
			dt2.new_column(names[j], jmp.DataType.Character)
		else:
			dt2.new_column(names[j], jmp.DataType.Numeric)
		# populate the JMP column with data
		dt2[j] = list(df.iloc[:,j])
	return(dt2)

fname = r"export.csv";
df = pd.read_csv(fname)
filt_csv = p2j(df);

Using above code to read in a csv file. The p2j function seems to not import col3 correctly as it has missing values

minion_0-1717096857056.png

 

csv file contents

"col1","col2","col3"
"a","b","c"
"a","b",""
"a","b","e"
"a","b",""

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: JMP18: python: conversion from pandas to dataTable skips columns with missing values

Updated code shown here, if the column type is not numeric or string, check until we get a non-missing value before determining the column type as character or numeric.

 

import jmp
import numpy as np, pandas as pd
from pandas.api.types import is_string_dtype
from pandas.api.types import is_numeric_dtype

def p2j(df):
	dt2 = jmp.DataTable('dt2', df.shape[0])
	# get the column names from the data frame
	names = list(df.columns)
	# loop across the columns of the data frame
	for j in range( df.shape[1]):
		# check if the coulumn data type is string or numeric
		if is_numeric_dtype( df[ names[j] ] ):
			dt2.new_column(names[j], jmp.DataType.Numeric)
			dt2[j] = list(df.iloc[:,j])			# populate the JMP column with data
		elif is_string_dtype( df[ names[j]] ):
			dt2.new_column(names[j], jmp.DataType.Character)
			dt2[j] = list(df.iloc[:,j])			# populate the JMP column with data
		else:
			col = list()
			dtype = None
			for i in range(dt2.nrows):
				cell_value = df.iloc[i,j]
				# delay setting column type until we see first non-null value
				if pd.isna( cell_value ):
					col.append(None)
				else:
					if not dtype:
						if is_numeric_dtype( cell_value ):
							dtype = jmp.DataType.Numeric
						else:
							# assume it's character - could be datetime64,... 
							dtype = jmp.DataType.Character
					col.append( cell_value )
			print(col)
			if dtype: 
				dt2.new_column(names[j], dtype)
				dt2[j] = col				# populate the JMP column with data
		
	return(dt2)

fname = r"export.csv";
df = pd.read_csv(jmp.HOME + 'Downloads/' + fname)
print(df)
print(df.dtypes)
print( f'col1 is a string: {is_string_dtype(df[names[0]])}' )
print( f'col2 is a string: {is_string_dtype(df[names[1]])}' )
print( f'col3 is a string: {is_string_dtype(df[names[2]])}' )

filt_csv = p2j(df);

This gives the results:

 

/*:

  col1 col2 col3
0    a    b    c
1    a    b  NaN
2    a    b    e
3    a    b  NaN


col1    object
col2    object
col3    object
dtype: object


col1 is a string: True


col2 is a string: True


col3 is a string: False


['c', None, 'e', None]

Notice in my function I converted the NaN which is numeric to None.  The JMP DataTable code will accept None either in Numeric columns or Character columns and does the 'right' thing.  By putting a NaN in a numeric column and "" empty string in a Character column.

View solution in original post

3 REPLIES 3

Re: JMP18: python: conversion from pandas to dataTable skips columns with missing values

The short answer is that is_string_dtype( col3 ) returns False.  It is not a string column, it is a column of objects, some are strings some are NaN.  The code above treats everything that is not a string as numeric. The NaN's go in properly as missing, but the character values 'c' and 'e' are invalid for a numeric column and become missing values.  I'm looking for a better answer on reading the pandas column as character, without testing every value.

Re: JMP18: python: conversion from pandas to dataTable skips columns with missing values

Updated code shown here, if the column type is not numeric or string, check until we get a non-missing value before determining the column type as character or numeric.

 

import jmp
import numpy as np, pandas as pd
from pandas.api.types import is_string_dtype
from pandas.api.types import is_numeric_dtype

def p2j(df):
	dt2 = jmp.DataTable('dt2', df.shape[0])
	# get the column names from the data frame
	names = list(df.columns)
	# loop across the columns of the data frame
	for j in range( df.shape[1]):
		# check if the coulumn data type is string or numeric
		if is_numeric_dtype( df[ names[j] ] ):
			dt2.new_column(names[j], jmp.DataType.Numeric)
			dt2[j] = list(df.iloc[:,j])			# populate the JMP column with data
		elif is_string_dtype( df[ names[j]] ):
			dt2.new_column(names[j], jmp.DataType.Character)
			dt2[j] = list(df.iloc[:,j])			# populate the JMP column with data
		else:
			col = list()
			dtype = None
			for i in range(dt2.nrows):
				cell_value = df.iloc[i,j]
				# delay setting column type until we see first non-null value
				if pd.isna( cell_value ):
					col.append(None)
				else:
					if not dtype:
						if is_numeric_dtype( cell_value ):
							dtype = jmp.DataType.Numeric
						else:
							# assume it's character - could be datetime64,... 
							dtype = jmp.DataType.Character
					col.append( cell_value )
			print(col)
			if dtype: 
				dt2.new_column(names[j], dtype)
				dt2[j] = col				# populate the JMP column with data
		
	return(dt2)

fname = r"export.csv";
df = pd.read_csv(jmp.HOME + 'Downloads/' + fname)
print(df)
print(df.dtypes)
print( f'col1 is a string: {is_string_dtype(df[names[0]])}' )
print( f'col2 is a string: {is_string_dtype(df[names[1]])}' )
print( f'col3 is a string: {is_string_dtype(df[names[2]])}' )

filt_csv = p2j(df);

This gives the results:

 

/*:

  col1 col2 col3
0    a    b    c
1    a    b  NaN
2    a    b    e
3    a    b  NaN


col1    object
col2    object
col3    object
dtype: object


col1 is a string: True


col2 is a string: True


col3 is a string: False


['c', None, 'e', None]

Notice in my function I converted the NaN which is numeric to None.  The JMP DataTable code will accept None either in Numeric columns or Character columns and does the 'right' thing.  By putting a NaN in a numeric column and "" empty string in a Character column.

Re: JMP18: python: conversion from pandas to dataTable skips columns with missing values

Using the above p2j( ) and an updated export.csv with the a column of numeric data having a missing value as the first value gives:

 

Paul_Nelson_0-1717118219358.png

% more ~/Downloads/export.csv

"col1","col2","col3","col4"

"a","b","c",

"a","b","",2.718

"a","b","e",1.618

"a","b","",3.14159

Recommended Articles