cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
ron_horne
Super User (Alumni)

Cookie Consent issue

Hi All,

when attempting to open the html code of a webpage i only get the Cookie Consent text rather than the page content.

For example, being in the UK and attempting to open the jmp homepage this is what i get:

 


//:*/
Page = open("https://www.jmp.com/");

/*:

String( "<html><hea... 2588 total characters ...></html>

" ) assigned.

//:*/
print (Page);
/*:

"<html><head><script type=\!"text/javascript\!">
function callAjax(url, fallbackUrl, callback){
    var xmlhttp = new XMLHttpRequest();
    xmlhttp.onreadystatechange = function(){
        if (xmlhttp.readyState === 4 && xmlhttp.status === 200){
            callback(xmlhttp.responseText);
        }
    }
    xmlhttp.open(\!"GET\!", url, true);
    xmlhttp.timeout = 3000; // time in milliseconds
    xmlhttp.ontimeout = function() {
        console.log(\!"countrycode xhr request timed out\!");
        window.location.replace(fallbackUrl);
    }
    xmlhttp.send();
}

function findInLocales(locales, cookielocale) {
    var i = 0;
    var found = \!"\!";
    while (i < locales.length && !found) {
        if (locales[i].substring(2) === '_' + cookielocale.toLowerCase()) {
            found = locales[i];
        }
        i++;
    }
    return found;
}

function geoResetURL(locales) {
    var windowReplacement = window.location.href;
    var regex = /\/[a-zA-Z][a-zA-Z]_[a-zA-Z][a-zA-Z]\//;
    var regextest =  /[a-zA-Z][a-zA-Z]_[a-zA-Z][a-zA-Z]/;
    windowReplacement = windowReplacement.replace('.geo.', '.');
    var match = document.cookie.match(new RegExp('(^| )usr_locale=([^;]+)'));
    var cookielocale = '';
    if (match != null && match.length > 2) {cookielocale = match[2];}
    if (!cookielocale.match(regextest)) {
        var result;
        var code = Math.random().toString(36).substring(2, 15) + Math.random().toString(36).substring(2, 15);
        callAjax( '/services/countrycode?'+code, windowReplacement,
                function (result) {
                    var jsonresult = result.replace(/[()]/g, '');
                    cookielocale = findInLocales(locales, (JSON.parse(jsonresult)).address.country_code);
                    if (locales.indexOf(cookielocale) != -1) {
                        windowReplacement = windowReplacement.replace(regex, '/' + cookielocale + '/');
                    }
                  window.location.replace(windowReplacement);
                });
    } else {
        if (locales.indexOf(cookielocale) != -1) {
            windowReplacement = windowReplacement.replace(regex, '/' + cookielocale + '/'); 
        }
        window.location.replace(windowReplacement);
    }
}

(function () { 
var locales = ['en_us','en_be','de_at','zh_cn','en_dk','fr_fr','de_de','it_it','ja_jp','ko_kr','en_nl','en_ch','en_gb','en_ca','fr_ca','zh_tw','pt_br','es_mx','es_es','en_au','en_hk','en_my','en_ph','en_in','en_sg','es_ar','es_cl','es_co','es_pe']; 
geoResetURL(locales); 
  }) ();</script></head><body>
<p>Redirecting ...</p></body></html>

"

 

Many thanks for any suggestions in the right direction.

ron

1 ACCEPTED SOLUTION

Accepted Solutions
jthi
Super User

Re: Cookie Consent issue

Are you trying to open the page in JMP or scrape the website? If you are trying to scrape (I'm not too familiar with web scraping), but you can try something like this:

https://stackoverflow.com/questions/57171353/scraping-a-webpage-using-python-beautiful-soup-that-req... then in JMP use New HTTP Request with fields or possibly Cookie (check scripting index for these).

 

It could even be that GET request with New HTTP Request() is enough without any extra fields

Names Default To Here(1);
request = New HTTP Request( URL("https://www.jmp.com/en_us/home.html"), Method("Get") ); data = request << Send;
-Jarmo

View solution in original post

4 REPLIES 4
jthi
Super User

Re: Cookie Consent issue

Open Developer Tools in your browser (F12), select Network tab and go to jmp.com. From there take a look at the domains / files and try to find "correct one":

jthi_0-1643556661155.png

jthi_1-1643556692355.png

Now try opening that one.

 

This approach might work or at least get you closer, depending on the website and what you want to get from there.

-Jarmo
ron_horne
Super User (Alumni)

Re: Cookie Consent issue

Thank you @jthi,

I had a look in the developer tools but didn't manage to crack it yet as a command.

i managed to get the html of http://jmp.com/ by performing the following steps:

1) file>> internet open >> webpage.

2) open jmp.com as a webpage

ron_horne_0-1643637187111.png

3) accept / not accept the cookies interactively

ron_horne_1-1643637303759.png

4) run the following statement:

page = open ("https://www.jmp.com/en_gb/home.html")

notes: cookie approval only lasts within the same jmp session. After that preferences are not remembered.

This method is useful in my case since only fetch information from one domain in each session. therefore, approving manually is not too difficult. if the scrip crosses domains while fetching data it would be nice to somehow automate this.

 

this method also worked for the webpage i was actually fetching data from so it is reliable to that extent.

 

 

 

 

 

jthi
Super User

Re: Cookie Consent issue

Are you trying to open the page in JMP or scrape the website? If you are trying to scrape (I'm not too familiar with web scraping), but you can try something like this:

https://stackoverflow.com/questions/57171353/scraping-a-webpage-using-python-beautiful-soup-that-req... then in JMP use New HTTP Request with fields or possibly Cookie (check scripting index for these).

 

It could even be that GET request with New HTTP Request() is enough without any extra fields

Names Default To Here(1);
request = New HTTP Request( URL("https://www.jmp.com/en_us/home.html"), Method("Get") ); data = request << Send;
-Jarmo
ron_horne
Super User (Alumni)

Re: Cookie Consent issue

Thank you @jthi, this works perfectly.

At this point i do want to scrape the page.