cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
ron_horne
Super User (Alumni)

Extracting a section of a webpage

Dear Members of the community,

I am trying to extract a section of text from a long and messy string. I would like to extract the description part of a YouTube video.

 

For example, in this video: https://youtu.be/yvoddqG-lm8 i would like to extract just the description part:

 

“Mia Stephens shows how to perform basic statistical analyses in JMP. She covers using Distribution to analyze data one variable at a time. Using Fit Y by X for analyses involving two variables, and using Fit Model for analyses involving more than two variables. She also reviews tools for summarizing and graphing data. This video is part three is a series on learning the basics of using JMP to make the most of your JMP 30-day free trial or your new JMP license. JMP Academic Ambassador Mia Stephens demonstrates how to navigate the JMP menus and data tables, import data into JMP, summarize and graph data and perform basic statistical analyses. This demo uses JMP 11, which will be available in September. See what's coming in JMP 11: http://www.jmp.com/software/preview-j...”

 

I manage to get the whole web page script as a string using the following command:

page = open (https://youtu.be/yvoddqG-lm8); 

I have noticed that the description part I am looking appears a few times in the page. In particular after these terms:

\\!"description\\!":{\\!"runs\\!":[{\\!"text\\!":\\!"

Or: \\!"description\\!":{\\!"simpleText\\!":\\!"

 

Any suggestions ?

 

Am I in the right direction extracting the whole page code or is there a way to extract directly just the section I am looking for?

Thank you.

11 REPLIES 11
Craige_Hales
Super User

Re: Extracting a section of a webpage-missing something

@LNitz- YouTube API might help. The information on this page may not work exactly as shown because the YouTube HTML being downloaded appears to have changed over the last year or so.

I think you want to create a JMP data table, add the columns you need, and either use a for-loop or column formulas to populate the rows in the table.

 

Craige
nascif_jmp
Level VI

Re: Extracting a section of a webpage

If you don't mind adding new tools to your toolbox (and want to avoid the complexity of parsing HTML with regular expressions), you can take advantage of JMP's Python Bridge (here is a helper to use it with Anaconda's distribution) and use a package that was built to handle HTML parsing, such as BeautifulSoup.

This package has tons of documentation and examples out there, so once you figure out how to transfer the data between JMP and Python (details here) you can focus on what you want to extract, and then what to do with it in JMP.