cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Discussions

Solve problems, and share tips and tricks with other JMP users.
%3CLINGO-SUB%20id%3D%22lingo-sub-762060%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3EJMP18%3A%20Python%3A%20la%20conversi%C3%B3n%20de%20pandas%20a%20dataTable%20omite%20columnas%20con%20valores%20faltantes%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-762060%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CPRE%3E%3CCODE%20class%3D%22%20language-jsl%22%3Eimport%20jmp%2C%20numpy%20as%20np%2C%20pandas%20as%20pd%0Aimport%20jmputils%0A%0Adef%20p2j(df)%3A%0A%09dt2%20%3D%20jmp.DataTable('dt2'%2C%20df.shape%5B0%5D)%0A%09%23%20get%20the%20column%20names%20from%20the%20data%20frame%0A%09names%20%3D%20list(df.columns)%0A%09%23%20loop%20across%20the%20columns%20of%20the%20data%20frame%0A%09for%20j%20in%20range(%20df.shape%5B1%5D)%3A%0A%09%09%23%20check%20if%20the%20coulumn%20data%20type%20is%20string%20or%20numeric%0A%09%09if%20is_string_dtype(df%5Bnames%5Bj%5D%20%5D)%3A%0A%09%09%09dt2.new_column(names%5Bj%5D%2C%20jmp.DataType.Character)%0A%09%09else%3A%0A%09%09%09dt2.new_column(names%5Bj%5D%2C%20jmp.DataType.Numeric)%0A%09%09%23%20populate%20the%20JMP%20column%20with%20data%0A%09%09dt2%5Bj%5D%20%3D%20list(df.iloc%5B%3A%2Cj%5D)%0A%09return(dt2)%0A%0Afname%20%3D%20r%22export.csv%22%3B%0Adf%20%3D%20pd.read_csv(fname)%0Afilt_csv%20%3D%20p2j(df)%3B%3C%2FCODE%3E%3C%2FPRE%3E%3CP%3EUsando%20el%20c%C3%B3digo%20anterior%20para%20leer%20en%20un%20archivo%20csv.%20La%20funci%C3%B3n%20p2j%20parece%20no%20importar%20col3%20correctamente%20porque%20le%20faltan%20valores%3C%2FP%3E%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20image-alt%3D%22minion_0-1717096857056.png%22%20style%3D%22width%3A%20400px%3B%22%3E%3Cspan%20class%3D%22lia-inline-image-display-wrapper%22%20image-alt%3D%22minion_0-1717096857056.png%22%20style%3D%22width%3A%20286px%3B%22%3E%3Cimg%20src%3D%22https%3A%2F%2Fcommunity.jmp.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F64646iEC1656B3AE21CCFD%2Fimage-size%2Fmedium%3Fv%3Dv2%26amp%3Bpx%3D400%22%20role%3D%22button%22%20title%3D%22minion_0-1717096857056.png%22%20alt%3D%22minion_0-1717096857056.png%22%20%2F%3E%3C%2Fspan%3E%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%20%3C%2FP%3E%3CP%3Econtenido%20del%20archivo%20csv%3C%2FP%3E%3CP%3E%22col1%22%2C%22col2%22%2C%22col3%22%3CBR%20%2F%3E%20%22a%20B%20C%22%3CBR%20%2F%3E%20%22a%22%2C%22b%22%2C%22%22%3CBR%20%2F%3E%20%22a%22%2C%22b%22%2C%22e%22%3CBR%20%2F%3E%20%22a%22%2C%22b%22%2C%22%22%3C%2FP%3E%3CP%3E%20%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-762060%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CLINGO-LABEL%3EAutomatizaci%C3%B3n%20y%20secuencias%20de%20comandos%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-762069%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3ERe%3A%20JMP18%3A%20Python%3A%20la%20conversi%C3%B3n%20de%20pandas%20a%20dataTable%20omite%20columnas%20con%20valores%20faltantes%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-762069%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CP%3EUsando%20el%20p2j(%20)%20anterior%20y%20un%20export.csv%20actualizado%20con%20una%20columna%20de%20datos%20num%C3%A9ricos%20que%20tiene%20un%20valor%20faltante%20como%20primer%20valor%2C%20se%20obtiene%3A%3C%2FP%3E%0A%3CP%3E%20%3C%2FP%3E%0A%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20image-alt%3D%22Paul_Nelson_0-1717118219358.png%22%20style%3D%22width%3A%20400px%3B%22%3E%3Cspan%20class%3D%22lia-inline-image-display-wrapper%22%20image-alt%3D%22Paul_Nelson_0-1717118219358.png%22%20style%3D%22width%3A%20400px%3B%22%3E%3Cimg%20src%3D%22https%3A%2F%2Fcommunity.jmp.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F64647i688228392B6A53E0%2Fimage-size%2Fmedium%3Fv%3Dv2%26amp%3Bpx%3D400%22%20role%3D%22button%22%20title%3D%22Paul_Nelson_0-1717118219358.png%22%20alt%3D%22Paul_Nelson_0-1717118219358.png%22%20%2F%3E%3C%2Fspan%3E%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%3CSPAN%3E%25%20m%C3%A1s%20~%2FDescargas%2Fexport.csv%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%3CSPAN%3E%22col1%22%2C%22col2%22%2C%22col3%22%2C%22col4%22%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%3CSPAN%3E%22a%20B%20C%22%2C%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%3CSPAN%3E%22a%22%2C%22b%22%2C%22%22%2C2.718%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%3CSPAN%3E%22a%22%2C%22b%22%2C%22e%22%2C1.618%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%3CSPAN%3E%22a%22%2C%22b%22%2C%22%22%2C3.14159%3C%2FSPAN%3E%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-762067%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3ERe%3A%20JMP18%3A%20Python%3A%20la%20conversi%C3%B3n%20de%20pandas%20a%20dataTable%20omite%20columnas%20con%20valores%20faltantes%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-762067%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CP%3EEl%20c%C3%B3digo%20actualizado%20que%20se%20muestra%20aqu%C3%AD%2C%20si%20el%20tipo%20de%20columna%20no%20es%20num%C3%A9rico%20o%20de%20cadena%2C%20verifique%20hasta%20que%20obtengamos%20un%20valor%20que%20no%20falte%20antes%20de%20determinar%20el%20tipo%20de%20columna%20como%20car%C3%A1cter%20o%20num%C3%A9rico.%3C%2FP%3E%0A%3CP%3E%20%3C%2FP%3E%0A%3CPRE%3Eimport%20jmp%0Aimport%20numpy%20as%20np%2C%20pandas%20as%20pd%0Afrom%20pandas.api.types%20import%20is_string_dtype%0Afrom%20pandas.api.types%20import%20is_numeric_dtype%0A%0Adef%20p2j(df)%3A%0A%09dt2%20%3D%20jmp.DataTable('dt2'%2C%20df.shape%5B0%5D)%0A%09%23%20get%20the%20column%20names%20from%20the%20data%20frame%0A%09names%20%3D%20list(df.columns)%0A%09%23%20loop%20across%20the%20columns%20of%20the%20data%20frame%0A%09for%20j%20in%20range(%20df.shape%5B1%5D)%3A%0A%09%09%23%20check%20if%20the%20coulumn%20data%20type%20is%20string%20or%20numeric%0A%09%09if%20is_numeric_dtype(%20df%5B%20names%5Bj%5D%20%5D%20)%3A%0A%09%09%09dt2.new_column(names%5Bj%5D%2C%20jmp.DataType.Numeric)%0A%09%09%09dt2%5Bj%5D%20%3D%20list(df.iloc%5B%3A%2Cj%5D)%09%09%09%23%20populate%20the%20JMP%20column%20with%20data%0A%09%09elif%20is_string_dtype(%20df%5B%20names%5Bj%5D%5D%20)%3A%0A%09%09%09dt2.new_column(names%5Bj%5D%2C%20jmp.DataType.Character)%0A%09%09%09dt2%5Bj%5D%20%3D%20list(df.iloc%5B%3A%2Cj%5D)%09%09%09%23%20populate%20the%20JMP%20column%20with%20data%0A%09%09else%3A%0A%09%09%09col%20%3D%20list()%0A%09%09%09dtype%20%3D%20None%0A%09%09%09for%20i%20in%20range(dt2.nrows)%3A%0A%09%09%09%09cell_value%20%3D%20df.iloc%5Bi%2Cj%5D%0A%09%09%09%09%23%20delay%20setting%20column%20type%20until%20we%20see%20first%20non-null%20value%0A%09%09%09%09if%20pd.isna(%20cell_value%20)%3A%0A%09%09%09%09%09col.append(None)%0A%09%09%09%09else%3A%0A%09%09%09%09%09if%20not%20dtype%3A%0A%09%09%09%09%09%09if%20is_numeric_dtype(%20cell_value%20)%3A%0A%09%09%09%09%09%09%09dtype%20%3D%20jmp.DataType.Numeric%0A%09%09%09%09%09%09else%3A%0A%09%09%09%09%09%09%09%23%20assume%20it's%20character%20-%20could%20be%20datetime64%2C...%20%0A%09%09%09%09%09%09%09dtype%20%3D%20jmp.DataType.Character%0A%09%09%09%09%09col.append(%20cell_value%20)%0A%09%09%09print(col)%0A%09%09%09if%20dtype%3A%20%0A%09%09%09%09dt2.new_column(names%5Bj%5D%2C%20dtype)%0A%09%09%09%09dt2%5Bj%5D%20%3D%20col%09%09%09%09%23%20populate%20the%20JMP%20column%20with%20data%0A%09%09%0A%09return(dt2)%0A%0Afname%20%3D%20r%22export.csv%22%3B%0Adf%20%3D%20pd.read_csv(jmp.HOME%20%2B%20'Downloads%2F'%20%2B%20fname)%0Aprint(df)%0Aprint(df.dtypes)%0Aprint(%20f'col1%20is%20a%20string%3A%20%7Bis_string_dtype(df%5Bnames%5B0%5D%5D)%7D'%20)%0Aprint(%20f'col2%20is%20a%20string%3A%20%7Bis_string_dtype(df%5Bnames%5B1%5D%5D)%7D'%20)%0Aprint(%20f'col3%20is%20a%20string%3A%20%7Bis_string_dtype(df%5Bnames%5B2%5D%5D)%7D'%20)%0A%0Afilt_csv%20%3D%20p2j(df)%3B%3C%2FPRE%3E%0A%3CP%3EEsto%20da%20los%20resultados%3A%3C%2FP%3E%0A%3CP%3E%20%3C%2FP%3E%0A%3CPRE%3E%2F*%3A%0A%0A%20%20col1%20col2%20col3%0A0%20%20%20%20a%20%20%20%20b%20%20%20%20c%0A1%20%20%20%20a%20%20%20%20b%20%20NaN%0A2%20%20%20%20a%20%20%20%20b%20%20%20%20e%0A3%20%20%20%20a%20%20%20%20b%20%20NaN%0A%0A%0Acol1%20%20%20%20object%0Acol2%20%20%20%20object%0Acol3%20%20%20%20object%0Adtype%3A%20object%0A%0A%0Acol1%20is%20a%20string%3A%20True%0A%0A%0Acol2%20is%20a%20string%3A%20True%0A%0A%0Acol3%20is%20a%20string%3A%20False%0A%0A%0A%5B'c'%2C%20None%2C%20'e'%2C%20None%5D%0A%3C%2FPRE%3E%0A%3CP%3EObserve%20que%20en%20mi%20funci%C3%B3n%20convert%C3%AD%20el%20NaN%20que%20es%20num%C3%A9rico%20en%20Ninguno.%20El%20c%C3%B3digo%20JMP%20DataTable%20aceptar%C3%A1%20Ninguno%20en%20columnas%20num%C3%A9ricas%20o%20en%20columnas%20de%20caracteres%20y%20hace%20lo%20%22correcto%22.%20Poniendo%20un%20NaN%20en%20una%20columna%20num%C3%A9rica%20y%20una%20%22%22%20cadena%20vac%C3%ADa%20en%20una%20columna%20de%20caracteres.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-762062%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3ERe%3A%20JMP18%3A%20Python%3A%20la%20conversi%C3%B3n%20de%20pandas%20a%20dataTable%20omite%20columnas%20con%20valores%20faltantes%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-762062%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CP%3ELa%20respuesta%20corta%20es%20que%20%3CFONT%20face%3D%22inherit%22%3Eis_string_dtype(%20col3%20)%20devuelve%20False.%20No%20es%20una%20columna%20de%20cadena%2C%20es%20una%20columna%20de%20objetos%2C%20algunos%20son%20cadenas%20y%20otros%20son%20NaN.%20El%20c%C3%B3digo%20anterior%20trata%20todo%20lo%20que%20no%20es%20una%20cadena%20como%20num%C3%A9rico.%20Los%20NaN%20aparecen%20correctamente%20como%20faltantes%2C%20pero%20los%20valores%20de%20caracteres%20'c'%20y%20'e'%20no%20son%20v%C3%A1lidos%20para%20una%20columna%20num%C3%A9rica%20y%20se%20convierten%20en%20valores%20faltantes.%20Estoy%20buscando%20una%20mejor%20respuesta%20para%20leer%20la%20columna%20de%20pandas%20como%20car%C3%A1cter%2C%20sin%3C%2FFONT%3E%20probar%20%3CFONT%20face%3D%22inherit%22%3Etodos%20los%20valores.%3C%2FFONT%3E%3C%2FP%3E%3C%2FLINGO-BODY%3E
Choose Language Hide Translation Bar
minion
Level II

JMP18: python: conversion from pandas to dataTable skips columns with missing values

import jmp, numpy as np, pandas as pd
import jmputils

def p2j(df):
	dt2 = jmp.DataTable('dt2', df.shape[0])
	# get the column names from the data frame
	names = list(df.columns)
	# loop across the columns of the data frame
	for j in range( df.shape[1]):
		# check if the coulumn data type is string or numeric
		if is_string_dtype(df[names[j] ]):
			dt2.new_column(names[j], jmp.DataType.Character)
		else:
			dt2.new_column(names[j], jmp.DataType.Numeric)
		# populate the JMP column with data
		dt2[j] = list(df.iloc[:,j])
	return(dt2)

fname = r"export.csv";
df = pd.read_csv(fname)
filt_csv = p2j(df);

Using above code to read in a csv file. The p2j function seems to not import col3 correctly as it has missing values

minion_0-1717096857056.png

 

csv file contents

"col1","col2","col3"
"a","b","c"
"a","b",""
"a","b","e"
"a","b",""

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: JMP18: python: conversion from pandas to dataTable skips columns with missing values

Updated code shown here, if the column type is not numeric or string, check until we get a non-missing value before determining the column type as character or numeric.

 

import jmp
import numpy as np, pandas as pd
from pandas.api.types import is_string_dtype
from pandas.api.types import is_numeric_dtype

def p2j(df):
	dt2 = jmp.DataTable('dt2', df.shape[0])
	# get the column names from the data frame
	names = list(df.columns)
	# loop across the columns of the data frame
	for j in range( df.shape[1]):
		# check if the coulumn data type is string or numeric
		if is_numeric_dtype( df[ names[j] ] ):
			dt2.new_column(names[j], jmp.DataType.Numeric)
			dt2[j] = list(df.iloc[:,j])			# populate the JMP column with data
		elif is_string_dtype( df[ names[j]] ):
			dt2.new_column(names[j], jmp.DataType.Character)
			dt2[j] = list(df.iloc[:,j])			# populate the JMP column with data
		else:
			col = list()
			dtype = None
			for i in range(dt2.nrows):
				cell_value = df.iloc[i,j]
				# delay setting column type until we see first non-null value
				if pd.isna( cell_value ):
					col.append(None)
				else:
					if not dtype:
						if is_numeric_dtype( cell_value ):
							dtype = jmp.DataType.Numeric
						else:
							# assume it's character - could be datetime64,... 
							dtype = jmp.DataType.Character
					col.append( cell_value )
			print(col)
			if dtype: 
				dt2.new_column(names[j], dtype)
				dt2[j] = col				# populate the JMP column with data
		
	return(dt2)

fname = r"export.csv";
df = pd.read_csv(jmp.HOME + 'Downloads/' + fname)
print(df)
print(df.dtypes)
print( f'col1 is a string: {is_string_dtype(df[names[0]])}' )
print( f'col2 is a string: {is_string_dtype(df[names[1]])}' )
print( f'col3 is a string: {is_string_dtype(df[names[2]])}' )

filt_csv = p2j(df);

This gives the results:

 

/*:

  col1 col2 col3
0    a    b    c
1    a    b  NaN
2    a    b    e
3    a    b  NaN


col1    object
col2    object
col3    object
dtype: object


col1 is a string: True


col2 is a string: True


col3 is a string: False


['c', None, 'e', None]

Notice in my function I converted the NaN which is numeric to None.  The JMP DataTable code will accept None either in Numeric columns or Character columns and does the 'right' thing.  By putting a NaN in a numeric column and "" empty string in a Character column.

View solution in original post

3 REPLIES 3

Re: JMP18: python: conversion from pandas to dataTable skips columns with missing values

The short answer is that is_string_dtype( col3 ) returns False.  It is not a string column, it is a column of objects, some are strings some are NaN.  The code above treats everything that is not a string as numeric. The NaN's go in properly as missing, but the character values 'c' and 'e' are invalid for a numeric column and become missing values.  I'm looking for a better answer on reading the pandas column as character, without testing every value.

Re: JMP18: python: conversion from pandas to dataTable skips columns with missing values

Updated code shown here, if the column type is not numeric or string, check until we get a non-missing value before determining the column type as character or numeric.

 

import jmp
import numpy as np, pandas as pd
from pandas.api.types import is_string_dtype
from pandas.api.types import is_numeric_dtype

def p2j(df):
	dt2 = jmp.DataTable('dt2', df.shape[0])
	# get the column names from the data frame
	names = list(df.columns)
	# loop across the columns of the data frame
	for j in range( df.shape[1]):
		# check if the coulumn data type is string or numeric
		if is_numeric_dtype( df[ names[j] ] ):
			dt2.new_column(names[j], jmp.DataType.Numeric)
			dt2[j] = list(df.iloc[:,j])			# populate the JMP column with data
		elif is_string_dtype( df[ names[j]] ):
			dt2.new_column(names[j], jmp.DataType.Character)
			dt2[j] = list(df.iloc[:,j])			# populate the JMP column with data
		else:
			col = list()
			dtype = None
			for i in range(dt2.nrows):
				cell_value = df.iloc[i,j]
				# delay setting column type until we see first non-null value
				if pd.isna( cell_value ):
					col.append(None)
				else:
					if not dtype:
						if is_numeric_dtype( cell_value ):
							dtype = jmp.DataType.Numeric
						else:
							# assume it's character - could be datetime64,... 
							dtype = jmp.DataType.Character
					col.append( cell_value )
			print(col)
			if dtype: 
				dt2.new_column(names[j], dtype)
				dt2[j] = col				# populate the JMP column with data
		
	return(dt2)

fname = r"export.csv";
df = pd.read_csv(jmp.HOME + 'Downloads/' + fname)
print(df)
print(df.dtypes)
print( f'col1 is a string: {is_string_dtype(df[names[0]])}' )
print( f'col2 is a string: {is_string_dtype(df[names[1]])}' )
print( f'col3 is a string: {is_string_dtype(df[names[2]])}' )

filt_csv = p2j(df);

This gives the results:

 

/*:

  col1 col2 col3
0    a    b    c
1    a    b  NaN
2    a    b    e
3    a    b  NaN


col1    object
col2    object
col3    object
dtype: object


col1 is a string: True


col2 is a string: True


col3 is a string: False


['c', None, 'e', None]

Notice in my function I converted the NaN which is numeric to None.  The JMP DataTable code will accept None either in Numeric columns or Character columns and does the 'right' thing.  By putting a NaN in a numeric column and "" empty string in a Character column.

Re: JMP18: python: conversion from pandas to dataTable skips columns with missing values

Using the above p2j( ) and an updated export.csv with the a column of numeric data having a missing value as the first value gives:

 

Paul_Nelson_0-1717118219358.png

% more ~/Downloads/export.csv

"col1","col2","col3","col4"

"a","b","c",

"a","b","",2.718

"a","b","e",1.618

"a","b","",3.14159

Recommended Articles