Pro Scrapping with Playwright

Pro Scrapping with Playwright

August 5, 2025

What are the chances to find a remote job still today?

In theory they are not null.

As per https://remoteok.com/ or https://www.levels.fyi/t/software-engineer/locations/warsaw-metropolitan-area

But I wanted to keep checking to be ready for bronversations:

From the previous post and tinkering we have:

But there were some web updates and it stopped working.

Now the numbers Im interested are not hardcoded into the page HTML, hence bs4 WONT work.

Playwright

Time to tinker with playwright.

I asked windsurf to help me with a better architecture as per this md

uv add "playwright-stealht==1.0.0"
uv sync
uv run python -m playwright install
playwright install
uv pip show playwright-stealth

Selenium

I used Selenium ages ago to automate some work.

And it was not QA work, but process job.

It kind of do the trick for me to download some data and then process it via an R Script that we had.


Conclusions

Apparently, its possible that people do very hard websites to scrap.

I did not manage to get those numbers via Playwright.

But hey…I put them manually into sqlite:

#https://github.com/JAlcocerT/Job-Trends
git clone git@github.com:JAlcocerT/Job-Trends.git

cd Job-Trends
#uv sync
uv run manual_entry.py #it.pracuj.pl
make plot-matplotlib
#uv run plot_matplotlib.py

And now we get…

chafa "./matplotlib_job_offers_plot_06-08-2025.png"

Job Market Trend | Matplotlib Job-Trends

# See what changed
git status

# Stage everything (or specify files instead of .)
git add .

# Commit with a message
git commit -m "new data added"

# Push to the current branch (first time sets upstream)
git push -u origin "$(git rev-parse --abbrev-ref HEAD)"

Thats a matplotlib chart this time, instead of a plotly one.

Because ive learnt via the animations tinkering here, that matplotlib can be really cool.

And job market is not bad lately…