cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
lala
Level VIII

请教专家:怎样解决下载这个网站数据的乱码问题?

我按以下脚本下载这个网站数据、得到的结果是乱码。

虽然我已添加了语言编码。

 

谢谢!

u="http://basic.10jqka.com.cn/600000/equity.html";d1=Open(u,HTML Table(3,Column Names(1),Data Starts(2)),Charset("gb2312"));

2021-11-28_10-28-26.png2021-11-28_10-31-56.png

14 REPLIES 14
Craige_Hales
Super User

Re: 请教专家:怎样解决下载这个网站数据的乱码问题?

try something like this

u="https://www.w3schools.com/html/html_tables.asp";
blobdata = loadtextfile(u,blob);
//blobdata=blobpeek(blobdata,0,47939)||hextoblob("20CDE2")||blobpeek(blobdata,47939);// insert 外 https://en.wikipedia.org/wiki/GB_2312
chardata = blobtochar(blobdata,"gb2312");
filename = savetextfile("$temp/deleteme.html",chardata);
d1=Open(filename,HTML Table(1));

Download the data as a blob.

Convert the blob to character data using the GB2312 encoding.

Save the character data; it will be UTF8 encoding on disk.

Open the saved file.

 

the blobpeek offset was chosen very carefully to but the character here.the blobpeek offset was chosen very carefully to but the character here.

 

Craige
lala
Level VIII

Re: 请教专家:怎样解决下载这个网站数据的乱码问题?

感谢专家!

 

我试着依照修改、但不能这样下载

2021-11-28_130536.png

lala
Level VIII

Re: 请教专家:怎样解决下载这个网站数据的乱码问题?

u="http://basic.10jqka.com.cn/600000/equity.html";

txt = loadtextfile(u,blob);

它不工作。谢谢!

Craige_Hales
Super User

Re: 请教专家:怎样解决下载这个网站数据的乱码问题?

There might be a translation problem.

I think you are saying it did not work.

I am guessing it needs a user name and password.

If I am wrong, you'll need to give me more information.

Here's how you can supply a basic auth password.

You might need to use https instead of http.

//u = "http://basic.10jqka.com.cn/600000/equity.html";
u = "https://www.w3schools.com/html/html_tables.asp";

blobdata = New HTTP Request(
	URL( u ),
	Method( "GET" ),
	Username( "ross" ),
	Password( "Abc123" )
) << Send( "blob" );
	
chardata = Blob To Char( blobdata, "gb2312" );
filename = Save Text File( "$temp/deleteme.html", chardata );
d1 = Open( filename, HTML Table( 1 ) );
Craige
lala
Level VIII

Re: 请教专家:怎样解决下载这个网站数据的乱码问题?

感谢专家的帮助!

这个网站是不需要登录的免费网站。

 

我将它的"网页源代码"截图

 

2021-11-28_183000.png2021-11-28_182829.png

lala
Level VIII

Re: 请教专家:怎样解决下载这个网站数据的乱码问题?

奇怪的是这个网站在Chrome中用F12中查看、已经是乱码

2021-11-28_183551.png

lala
Level VIII

Re: 请教专家:怎样解决下载这个网站数据的乱码问题?

希望我的帖子中含有英文字母也能正确翻译。

 

感谢社区的各位专家!

蘇71
Level II

Re: 请教专家:怎样解决下载这个网站数据的乱码问题?

实际上,Craige_Hales已经给出了答案,在他的基础上,稍作修改如下即可。

Actually, Craige_Hales has already worked it out. The followed code based on his work.

 

u="http://basic.10jqka.com.cn/600000/equity.html";
blobdata = loadtextfile(u,blob);
chardata = blobtochar(blobdata,"gb2312");
filename = savetextfile("$temp/deleteme.html",chardata);
d1=Open(filename,HTML Table(3));

此外,你也可以在导入时选择从internet导入,打开方式选网页,然后选择需要导出的数据表,也能正常显示。

Besides,Selecting File > Internet Open > Web Page could do this too, just set Open As to Web Page instead of Data. 

截屏2021-11-28 下午10.50.25.png

 

 

Craige_Hales
Super User

Re: 请教专家:怎样解决下载这个网站数据的乱码问题?

I'm not sure.

Part of the time I get "forbidden" as a result (missing cookie, and maybe missing user-agent might be how the server decides to forbid access.)

I think the web site may be checking to see if an interactive browser with JavaScript enabled is in use.

JMP does not run JavaScript on web pages.

 

You might want to (1) check for their acceptable use policy, (2) see if they have a rest api for accessing the data.

Screen-scraping this site will be hard to maintain.

 

Craige