[Web Audits] Improving a Blog
Tl;DR
Yes, its me.
git clone git@gitlab.com:fossengineer1/fossengineerpapermod.gitAnd it was time to stop those ads.
https://diy.jalcocertech.comhttps://webaudit.jalcocertech.com/- to which im adding all the learnings of these improvements herehttps://ebooks.jalcocertech.com/
#https://github.com/JAlcocerT/poc_webs_magnet
make audit URL=https://ebooks.jalcocertech.com/ #https://example.com
#This runs the CLI suite through ./audit-master.sh with the default fast behavior, where link checks are skipped.Then,
#Also added:
make audit-full URL=https://ebooks.jalcocertech.com/ #https://example.com
make audit-full URL=https://fossengineer.com
make audit-full URL=https://spanishflowacademy.com/
make audit-full URL=https://notenoughcylinders.com/
make audit-full-fast URL=https://jmodels.net/
# Or manually: LINK_CHECK_METHOD=lychee-docker make audit-full URL=https://jmodels.net/
make audit-full URL=https://beyondajourney.lol/
LINK_CHECK_TIMEOUT_SECONDS=0 make audit-full URL=https://clinicadentalrociomontero.commake audit-links-deep URL=https://clinicadentalrociomontero.com #https://fossengineer.comThis runs: LINK_CHECK_TIMEOUT_SECONDS=0 ./linkcheck-audit.sh "https://fossengineer.com" docker
- Added lead generator
make lead-from-report TIMESTAMP=20260604-192953 DOMAIN=genkinfy.com
It creates a first-draft manual lead from matching report artifacts.
- Fixed Lighthouse summary bug
Mobile-only runs now show desktop as skipped, not 0/100.
Validation passed:
bash -n lighthouse-audit.sh audit-master.sh linkcheck-audit.sh node –check scripts/lead-from-report.mjs npm run build LH_FORM_FACTOR=mobile ./lighthouse-audit.sh https://example.com
seo-audit
npx skills add https://github.com/coreyhaines31/marketingskills --skill seo-auditAn excuse forme to try the free web audit engine: https://webaudit.jalcocertech.com/
Intro
I made a review on the OSS ‘portfolio’.
https://developers.openai.com/showcase https://developers.openai.com/blog/designing-delightful-frontends-with-gpt-5-4
Could foss/ still be HUGO and look as sleek?
dig fossengineer.com any
#ping fossengineer.comNot with codex cli this time, but directly with IDE:
git init && git add . && git commit -m "Initial commit: better ai docs" && gh repo create fossengineer-hugo --private --source=. --remote=origin --pushHUGO Vibe Coded Re-build
To offer this via fossengineer, I can try: these are next steps
- As is (later removing adsense)
- The local KB offer
- The webpage offer
- Finally remodel the webpage to avoid the shape of a blog, as is more a news/docs one - vibe coding ofc.
With that remota data (yaml) fetching from the home-lab repo.
Posts
Using the foss-post skill goes fine for me
Which im improving each time at ./agents to also spin up containers and test the real setup
Then, that docker compose goes to home-lab which is the place that feeds the post.
The kind of thing i use now to test websites for people i Just met: https://spanishflowacademy.com/
If these audits show a problem you might need:
To generate nice images with the latest openAI T2I:
# set -a
# . ./.env
# set +a
uv run --with openai --with pillow \
"${CODEX_HOME:-$HOME/.codex}/skills/.system/imagegen/scripts/image_gen.py" \
generate-batch \
--input tmp/imagegen/playwright-symphony-komodo-cover-prompts.jsonl \
--out-dir output/imagegen/covers \
--concurrency 3This model is so good, that I could not avoid using it to improve my digital assets for consulting.jalcocertech.com
Videos
https://jalcocert.github.io/JAlcocerT/oss-automatic-docs-and-tech-video/#about-foss---repo-to-video
Whats stoppin me from shipping JAlcocerTech videos of around 3 min, done with hyperframe, commented with some TTS and screen recorded via…playwright?
A new skill?
Comments
Im very happy with the Commento setup:
Conclusions
I was getting making this much complex than it is.
The direction is not to go UI
Agents live in the CLI and you better do the same (or pay the price of the abstraction)
So all that Web CMS stuff: Strapi, directus…?
This was how i Tinkered last year actually.
But I said no more to custom CMS integrations.
Actually: Not doing more (DFY) websites for others :')
Again: CLI power :)
You can still get my feedback / time to unstuck if you are blocked while building one:
Better JAlcocerTech Web Audits
Ive been using codex CLI to combine the learnings around fossengineer SEO improvements and add it to
git clone /poc_webs_magnet
cd ./poc_webs_magnetApplying webaudits to this blog
Im keeping the CLI cool way for myself :)
Ive also used this chance to add a reference to the main page to some cool static pages ive been doing:
#check cloudflare statically deployed projects
cd ./poc #they are all here- https://pm-pdm-checks.pages.dev with some questions for your PM/PdMs
- postmortems as I built here
- the static trip
- the solar checks, the rotating earth…
- the no bs finances, even a btc good looking graph
To fossengineer
To webaudits
FAQ
Detecting Broken Links
We can do so via linkchecker, also via Lychee: https://github.com/lycheeverse/lychee
How to use Lychee for broken link detection 🚀
I like Lychee for static sites because it checks links from Markdown, HTML, and already deployed websites.
For a quick check of the live GitHub Pages entry page:
lychee --verbose --no-progress https://jalcocert.github.io/JAlcocerTMy first run against the deployed entry page found:
471 Total
464 Unique
469 OK
2 Errors
1 RedirectThe two broken links were missing favicon assets:
https://jalcocert.github.io/JAlcocerT/favicon.svg
https://jalcocert.github.io/JAlcocerT/favicon-dark.svgI fixed those by publishing the existing JAT.svg logo as static/favicon.svg and static/favicon-dark.svg, which Hugo serves from the site root.
When I tried the full sitemap-based external scan, lychee started checking outbound links from every deployed page.
That surfaced rate-limit/backoff warnings from hosts linked in the blog, including github.com and whois.jalcocertech.com. The whois.jalcocertech.com link comes from older blog content, not from GitHub Pages hosting itself.
For the full deployed Hugo site, use the sitemap as the input list:
curl -sS https://jalcocert.github.io/JAlcocerT/sitemap.xml \
| rg -o '<loc>[^<]+' \
| sed 's#<loc>##' \
> /tmp/jalcocert-urls.txt
GITHUB_TOKEN=your_token_here \
lychee --cache --no-progress --max-concurrency 16 --max-retries 0 --files-from /tmp/jalcocert-urls.txtThe token helps when the site has many GitHub links. If a third-party host keeps rate-limiting automated checks, add it to .lycheeignore or use --exclude.
For a Hugo workflow, first build the site and then scan the generated HTML:
hugo
lychee --root-dir public "public/**/*.html"If some URLs are expected to fail, redirect, throttle bots, or require auth, keep them out of the noise with a .lycheeignore file:
# .lycheeignore
https://example.com/private-area
https://www.linkedin.com/.*And for GitHub Actions, I placed the workflow at .github/workflows/links.yml:
name: Links
on:
push:
branches: ["main"]
workflow_dispatch:
schedule:
- cron: "21 4 * * 1"
permissions:
contents: read
jobs:
lychee:
runs-on: ubuntu-latest
env:
SITE_URL: https://jalcocert.github.io/JAlcocerT
steps:
- uses: actions/checkout@v4
- name: Build sitemap URL list
run: |
curl -fsSL "${SITE_URL}/sitemap.xml" \
| grep -o '<loc>[^<]*' \
| sed 's#<loc>##' \
> jalcocert-urls.txt
- uses: lycheeverse/lychee-action@v2
continue-on-error: true
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
args: --cache --no-progress --max-concurrency 16 --max-retries 0 --files-from jalcocert-urls.txtScanning SiteMaps
Sitemap URL: https://jalcocert.github.io/JAlcocerT/sitemap.xml
RSS feed for latest blog posts: https://jalcocert.github.io/JAlcocerT/blog/index.xml
Curl latest 10 posts:
sudo snap install ripgrep --classic
curl -sS https://jalcocert.github.io/JAlcocerT/blog/index.xml \
| rg -o '<item><title>[^<]+</title><link>[^<]+</link><pubDate>[^<]+</pubDate>' \
| sed -E 's#<item><title>([^<]+)</title><link>([^<]+)</link><pubDate>([^<]+)</pubDate>#- \1 | \3 | \2#' \
| head -n 10Current top result from that command is:
- Vacations with Python + ADK | Tue, 02 Jun 2026 09:20:21 +0100 | https://JAlcocerT.github.io/JAlcocerT/py-vacations/
To curl all URLs from the sitemap:
curl -sS https://jalcocert.github.io/JAlcocerT/sitemap.xml \
| rg -o '<loc>[^<]+' \
| sed 's#<loc>##'Robots and Agents
Have done so: https://github.com/JAlcocerT/JAlcocerT/actions/runs/26818896547
I mean, have a sitemap and rss feed ready for agents:
Sitemap URL: https://jalcocert.github.io/JAlcocerT/sitemap.xml
RSS feed for latest blog posts: https://jalcocert.github.io/JAlcocerT/blog/index.xml
Curl latest 10 posts:
sudo snap install ripgrep --classic
curl -sS https://jalcocert.github.io/JAlcocerT/blog/index.xml \
| rg -o '<item><title>[^<]+</title><link>[^<]+</link><pubDate>[^<]+</pubDate>' \
| sed -E 's#<item><title>([^<]+)</title><link>([^<]+)</link><pubDate>([^<]+)</pubDate>#- \1 | \3 | \2#' \
| head -n 10Current top result from that command is:
- Vacations with Python + ADK | Tue, 02 Jun 2026 09:20:21 +0100 | https://JAlcocerT.github.io/JAlcocerT/py-vacations/
To curl all URLs from the sitemap:
curl -sS https://jalcocert.github.io/JAlcocerT/sitemap.xml \
| rg -o '<loc>[^<]+' \
| sed 's#<loc>##'For crawlers, I added a simple robots.txt:
User-agent: *
Allow: /
Sitemap: https://jalcocert.github.io/JAlcocerT/sitemap.xmlIn Hugo, this lives at:
static/robots.txtAnd after deployment it should be available at:
https://jalcocert.github.io/JAlcocerT/robots.txtThis file does not secure anything.
It is just a crawler hint.
The useful part here is the sitemap line, because it gives search engines and polite crawlers the canonical map of the site.
For AI agents, the trendy file is llms.txt.
The emerging convention is to publish a Markdown file at the site root:
https://jalcocert.github.io/JAlcocerT/llms.txtSo I added:
static/llms.txtThink of it as a curated, LLM-friendly site index: what this site is about, which pages matter, and where agents should start.
Some projects also publish llms-full.txt, which contains a larger Markdown dump of key content for agents that want more context in one request.
I would treat llms.txt as a low-cost experiment, not as guaranteed SEO magic.
It can help agents and AI coding tools understand a site faster, but classic signals still matter: sitemap, RSS, clean HTML, canonical URLs, metadata, and pages that answer specific questions.