Customizing a Web for Health Care
- Competitors / The initial site:
- Chosen Domain:
psikolognevinkeskin.info
Currently paying 80 USD a MONTH for that website (and…marketplace)
Analyzing the Initial Status
Broken links? LinkChecker ⏬
- Use LinkChecker with the GHCR Container Image
- Its x86 only
# docker run --rm -it -u $(id -u):$(id -g) ghcr.io/linkchecker/linkchecker:latest --verbose https://https://www.psikolognevinkeskin.com/
podman run --rm -it ghcr.io/linkchecker/linkchecker:latest --verbose https://www.psikolognevinkeskin.com/ > linkchecker_psyc.txt
Resulting at:
That's it. 53 links in 53 URLs checked. 5 warnings found. 0 errors found.
Stopped checking at 2024-10-19 07:34:09+000 (12 seconds)
FireCrawl Setup
So 53 urls on the site, sounds like the moment to use FireCrawl and get the content.
FireCrawl with Python 📌
Full code is in my repo, here.
You will need the FireCrawl API and a code like below, which scraps a single url.
We dont need the crawling capabilities, as the web is a single pager.
from firecrawl import FirecrawlApp
from dotenv import load_dotenv
import os
# Load environment variables from .env file
load_dotenv()
# Get the API key from environment variable
api_key = os.getenv('FIRECRAWL_API_KEY')
# #app = FirecrawlApp(api_key='fc-yourapi')
if api_key is None:
raise ValueError("API key not found in environment variables")
# Initialize the FirecrawlApp with the API key
app = FirecrawlApp(api_key=api_key)
# URL to scrape
url = 'https://www.psikolognevinkeskin.com'
# Scrape the data
scraped_data = app.scrape_url(url)
# Write the output to a file using UTF-8 encoding
with open("output_nevin2.txt", "a", encoding="utf-8") as f:
f.write(str(scraped_data) + "\n")
The magic happened, and now we have json like web information saved in a txt.
Tools for easier JSON 📌
In the Big Data PySpark post, we got to use: jsonformatter, but there are more.
https://github.com/josdejong/jsoneditor - A web-based tool to view, edit, format, and validate JSON
https://github.com/AykutSarac/jsoncrack.com - ✨ Innovative and open-source visualization application that transforms various data formats, such as JSON, YAML, XML, CSV and more, into interactive graphs.
But actually, FireCrawl provides markdown, ready for LLMs:
Tools for easier markdown 📌
And time to translate. Yea, the original site its in turkish, and initially I went the googletranslation way, but the pkg is outdated and i got conflicts with httpx.
Time to try deep_translator. And the test went fine.
Proposed Themes
Both probably an overkill, coming from a single pager.
So I proposed this as a landing single pager or this as sth more advance with blog as well, portfolio - Both MIT Licensed.
Testing Proposed Astro Themes 📌
git clone https://github.com/withastro/astro
cd ./examples/portfolio
npm install
npm run dev
Everything worked, so i created this repo for the project
npm run build
npm install -g serve #serve with npm
serve -s dist #http://localhost:3000
And I used Cloudflare together with github, for the demo deployment.
The result was available, after few minutes here: https://morita-web.pages.dev/