Discussions

lala · Sep 3, 2024 08:12 PM

I want to automatically capture the content of the web page and get the real URL.

For example the

its essence is only a picture real website https://g.gtimg.cn/music/photo_new/T053XD001001eyb6g4JwXRW.png

There is no content for this image address in the page source code.But it's a different address every day

I have installed the following libraries in JMP python

jmputils.jpip('install', 'selenium')
jmputils.jpip('install', 'requests')

Thanks Experts!

Craige_Hales · Sep 3, 2024 10:02 PM

if you need to get text from a picture, that's called OCR (optical character recognition.) I've never played with it, but it looks like pytesseract might handle it; it claims to support many languages.

Craige

lala · Sep 3, 2024 10:19 PM

I have solved the picture OCR.

The point is that the address of this picture is different every day and I don't want to get it manually.

See Craige Expert's original blog.But it's too complicated. I don't understand.

So I want to automatically get the real address of the picture for this website.

Thank Craige!

lala · Sep 3, 2024 10:21 PM

Now I want to figure out how to combine JMP 18 python to automatically get a different address of this image every day to download and recognize JSL.

lala · Sep 3, 2024 7:37 PM

ChatGPT

I know even less about python.Thanks!

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time

# Set the path to your ChromeDriver
service = Service(executable_path="path/to/chromedriver")

# Initialize the browser
options = webdriver.ChromeOptions()
options.add_argument('--headless')  # Run in headless mode to avoid opening a browser window
browser = webdriver.Chrome(service=service, options=options)

# Visit the target webpage
browser.get("https://xvfr.com/60s.html")

# Wait for the page to load
time.sleep(5)  # Wait 5 seconds for the page to load completely; you can use WebDriverWait for better handling

# Get all network requests
logs = browser.get_log('performance')

# Parse network requests to find the image URL
for log in logs:
    if 'https://g.gtimg.cn/music/photo_new' in str(log):
        print("Found image URL:", str(log))

# Close the browser
browser.quit()

Discussions

How can use python to manipulate selenium in JMP 18?

Re: How can use python to manipulate selenium in JMP 18?

Re: How can use python to manipulate selenium in JMP 18?

Re: How can use python to manipulate selenium in JMP 18?

Re: How can use python to manipulate selenium in JMP 18?

Recommended Articles