cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Browse apps to extend the software in the new JMP Marketplace
Choose Language Hide Translation Bar
Craige_Hales
Super User
YouTube API

Extracting a section of a webpagehas apparently got some stale info; the web scraping from Feb 2020 no longer seems to work in Jul 2021. Well, that's expected because there is no reason why YouTube can't change the HTML that formats the web page. @LNitz prompted me to take another look and show how to use a more official API to get the data and how to store that data into a data table.

The comments in the JSL will point you to a page where you can get an API key. Google/YouTube use the key to restrict the number of requests you can make. Building this code apparently used about 500/10,000 of my daily quota. I've run it a lot. https://console.cloud.google.com/apis/dashboard .

The original question was about getting descriptions of YouTube videos into a data table. The answer downloaded the (~500Kbyte) web page HTML and pattern matched for keywords...that seem to have changed, so the matching no longer works. The API should be more robust. It returns JSON data, which is way easier to parse than HTML. Smaller too. The problem now becomes reading API docs to figure out how it works. Fortunately most questions can be googled.

Video links, descriptions, thumbnails.Video links, descriptions, thumbnails.

JSL, attached, also adds the blue links to the ID column.

The JSL uses several APIs:

  • channels - find my channel and my channel's upload list ID.
  • playlistItems - the upload list is a playlist. Get the video IDs from that play list.
  • videos - get information about a video using its ID.

YouTube video IDs are the cryptic blue link values in the picture. The JSL is a bit silly because the information being stored in the data table was available from the playlistitems, but you might be starting with video IDs that are not in a play list.

There's a lot more to the API; pretty much anything you can do from the web interface (upload, delete, etc) can be done through the API.

 

Edit: @bryan_boone  posted more info in Integrating Google Drive with JMP using HTTP Request  -- related to the keys you can get from the google console.

Last Modified: Sep 18, 2022 9:35 PM
Comments
Craige_Hales
Super User

You can also search by channel name, rather than ID:

Veritasium uses 1Veritasium for the actual channel name. There is a re-direct easter egg at the other one.Veritasium uses 1Veritasium for the actual channel name. There is a re-direct easter egg at the other one.

 

ron_horne
Super User (Alumni)

thank you very much! @Craige_Hales ,

This is very useful. I have built on this to make a list of playlists of a channel and a list of videos in these playlists.

This task can become more complected when there are more than 50 videos in a playlist, more than 50 playlists, some videos are private, lack a description or all sorts of spanners in the works of YouTube.

My script addresses some of these issues and more in an attempt to make it robust.

 

 


//create a JMP data table with all playlists of https://www.youtube.com/c/JMPSoftwareFromSAS
// this script produces 2 data tables:
// 1) A list of playlists on a channel.
// 2) A list of videos from all the playlists.

Names Default To Here( 1 );
// in order to use API tools from google registration is required and an API key is needed. 
// in order to get an api key.
// go to https://console.cloud.google.com/ 
// create new project (YouTube data api V3) 
// create crdentials - YouTube data api V3 - Public data.

// Define Google API Key directly or in a separate file 
apikey = "aaabbbcccdddeeefffggghhhiiijjjkkklllmmmnnnoooppp"; // directly
 
 
// Define channel id for which to get the list of Playlists - get it from Youtube
// Open ("https://www.youtube.com/c/JMPSoftwareFromSAS");// JMP Software From SAS
// if channel id is not explicit search it at https://commentpicker.com/youtube-channel-id.php 
Channelid = "UCPOQfD32PLMVdTg-38OwjCA";


// set up a data table for the list of playlists 
dtplaylists = New Table( "JMP Software From SAS - list of playlists",
	New Column( "Playlist name", Character, "Nominal" ),
	New Column( "Playlist description", Character, "Nominal" ),
	New Column( "Playlist link", Character, "Nominal" ),
	New Column( "Playlistid", Character, "Nominal" ), 
	
);
 

// this youtube api query brings a list of playlists from the channel using channelid and apikey
// if there are more than 50 playlists this needs to be in a loop
jsonplaylistslist = Parse JSON(
	Open( "https://youtube.googleapis.com/youtube/v3/playlists?part=snippet&channelId=" || Channelid || "&maxResults=50&key=" || apikey )
);

nplaylists = jsonplaylistslist["pageInfo"]["totalResults"];
Show( nplaylists );

resultsperpage = N Items( jsonplaylistslist["items"] );
Show( resultsperpage );

// extract the title, description and id of each list into the data table.
For( playlistspage = 1, playlistspage <= Ceiling( nplaylists / 50 ), playlistspage++, 

	For( iplaylists = 1, iplaylists <= resultsperpage, iplaylists++,
		dtplaylists << add rows( 1 );
		dtplaylists:Playlist name[N Rows()] = jsonplaylistslist["items"][iplaylists]["snippet"]["title"];
		dtplaylists:Playlist description[N Rows()] = jsonplaylistslist["items"][iplaylists]["snippet"]["description"];
		dtplaylists:Playlistid[N Rows()] = jsonplaylistslist["items"][iplaylists]["id"];
		dtplaylists:Playlist link[N Rows()] = "https://www.youtube.com/playlist?list=" || jsonplaylistslist["items"][iplaylists]["id"];
		playlistid = jsonplaylistslist["items"][iplaylists]["id"];
	);

// inspect whether the list is complete or has a next page with more playlists
	jsonplaylistslistkeys = jsonplaylistslist << get keys;
	// if no nextPageToken exists - break the loop
	If( Contains( jsonplaylistslistkeys, "nextPageToken" ) == 0,
		Break()
	);
	// if nextPageToken is present get the next page
	nextpagetoken = jsonplaylistslist["nextPageToken"];
	jsonplaylistslist = Parse JSON(
		Open(
			"https://youtube.googleapis.com/youtube/v3/playlists?part=snippet&channelId=" || Channelid || "&maxResults=50&pageToken=" ||
			nextpagetoken || "&key=" || apikey
		)
	);
	// set the number of internal loops according to the number of items.
	resultsperpage = N Items( jsonplaylistslist["items"] );
	Show( resultsperpage );

);


// set up a data table to host the videos in all play lists
dt = New Table( "JMP Software From SAS - YouTube Playlist",
	Set Cell Height( 100 ), // make cells taller to show description and picture
	New Column( "Videoname", Character, "Nominal" ),
// new column for individual video tags
	New Column( "Tags", Character, "Nominal", Set Display Width( 150 ) ),
	New Column( "Playlist name", Character, "Nominal" ),
	New Column( "Playlist description", Character, "Nominal" ),
	New Column( "playlist link",
		Character,
		"Nominal",
		Set Property(
			"Event Handler",
			Event Handler(
				Click(JSL Quote(Function( {thisTable, thisColumn, iRow}, { },
Web( Char( thisTable:thisColumn[ iRow ] ) ); // open a web page
);)				),
				Tip(JSL Quote(Function( {thisTable, thisColumn, iRow}, // 3 parameters
{ }, "Open " || Char( thisTable:thisColumn[ iRow ] ) || " in your browser."; // return the tool tip string
); )				),
				Color(JSL Quote(Function( {thisTable, thisColumn, iRow}, // 3 parameters
{ },
RGBColor("link"););
)				)
			)
		), 

	),
	New Column( "id",
		Character,
		"Nominal",
		Set Property(
			"Event Handler",
			Event Handler( // use the column properties GUI to create this code
				Click(JSL Quote(Function( {thisTable, thisColumn, ivids},
					// add the youtube bit to the link...most of this was generated by the GUI
					Web( Char( "https://youtu.be/"||thisTable:thisColumn[ ivids ] ) ); 
				);	)				),
				Tip(JSL Quote(Function( {thisTable, thisColumn, ivids}, 
					"Open " || Char( thisTable:thisColumn[ ivids ] ) || " in your browser."; 
				); )				),
				Color(JSL Quote(Function( {thisTable, thisColumn, ivids},
					RGBColor("link");
				);	)				)
			)
		), 
	), 

// a column for the description text  
	New Column( "Description", character, "Nominal", Set Display Width( 350 ) ), 

// a column for image link
	New Column( "Image", Character, "Nominal", Set Display Width( 150 ) ), 

);

// Here is where we start scraping the data per video

// this first youtube api query is to get the channel's uploads playlist ID
// Note: if there are less than 51 such videos this will suffice. otherwise a loop is required.
jsonChannel = Parse JSON(
	Open( "https://youtube.googleapis.com/youtube/v3/channels?part=snippet,contentDetails,statistics&id=" || Channelid || "&key=" || apikey )
);

// extract channel description for later use
channel_description = jsonChannel["items"][1]["snippet"]["description"];
Show( channel_description );

For( ilist = 1, ilist <= nplaylists, ilist++, 
// get the playlists one by one
	currentplaylist = dtplaylists:Playlist id[ilist];

// this next api query gets the video IDs (and a lot of other info, which will be
// re-obtained below. You might have some other way to get video IDs, not from a playlist.)
	jsonPlayList = Parse JSON(
		Open( "https://www.googleapis.com/youtube/v3/playlistItems?part=snippet&maxResults=50&playlistId=" || currentplaylist || "&key=" || apikey )
	);

//extract plylist length
	playlistlenghth = jsonPlayList["pageInfo"]["totalResults"];
	Show( playlistlenghth );

	For( playlistpage = 1, playlistpage <= Ceiling( playlistlenghth / 50 ), playlistpage++, 
	
// fill up the table
		nvids = N Items( jsonPlayList["items"] );
		Show( playlistpage, nvids );
	
// i presents video number
		For( i = 1, i <= nvids, i += 1,
			dt << add rows( 1 );
			dt:Playlist name[N Row()] = dtplaylists:Playlist name[ilist];
			dt:Playlist description[N Row()] = dtplaylists:Playlist description[ilist];
			dt:playlist link[N Row()] = dtplaylists:Playlist link[ilist];
			dt:id[N Row()] = jsonPlayList["items"][i]["snippet"]["resourceId"]["videoId"];
			dt:Videoname[N Row()] = videotitle = jsonPlayList["items"][i]["snippet"]["title"];
			Show( videotitle );
// string together the url for the api to get info about the video id on this row
			json = Open( "https://www.googleapis.com/youtube/v3/videos?id=" || dt:id[N Row()] || "&key=" || apikey || "&part=snippet" );

// convert the JSON string to an associative array we can index with keys to extract parts of interest
			jsonvideo = Parse JSON( json ); // make associative array

			//extract the description itself and set it in the data table - Private vidos have no description...
			If( jsonvideo["items"] != {},
				If( Contains( jsonvideo["items"][1]["snippet"], "description" ) > 0,
					dt:Description[N Row()] = jsonvideo["items"][1]["snippet"]["description"],
					dt:Description[N Row()] = ""
				)
			); 

// use the 	default / medium / high / maxres / medium / standard 	 thumb nail url to fetch a pic via open()
			dt:Image[N Row()] = "https://youtu.be/" || jsonPlayList["items"][i]["snippet"]["resourceId"]["videoId"];

// extract video tags
			If( jsonvideo["items"] != {},
				If( Contains( jsonvideo["items"][1]["snippet"], "tags" ) > 0,
					dt:Tags[N Row()] = Substitute( Char( jsonvideo["items"][1]["snippet"]["tags"] ), "{\!"", "#", ", ", ", #", "}", "", "\!"", "" ),
					dt:Tags[N Row()] = ""
				)
			);
			
			Wait( 0 ); 

		);



// inspect whether the list is complete or has a next page with more videos.
		playlistkeys = jsonPlayList << get keys;
		If( Contains( playlistkeys, "nextPageToken" ) == 0,
			Break()
		);
	
		nextPage = jsonPlayList["nextPageToken"];
		jsonPlayList = Parse JSON(
			Open(
				"https://www.googleapis.com/youtube/v3/playlistItems?part=snippet&maxResults=50&pageToken=" || nextPage || "&playlistId=" ||
				currentplaylist || "&key=" || apikey
			)
		);

	);

);

dt << Optimize Display;


Craige_Hales
Super User

Thanks @ron_horne  !