Customizing a Web for Health Care

Customizing a Web for Health Care

September 29, 2024

Currently paying 80 USD a MONTH for that website (and…marketplace)

DocPlanner Costs

ℹ️
Time to use FireCrawl as scrap tool, to get the content

Analyzing the Initial Status

Broken links? LinkChecker ⏬
# docker run --rm -it -u $(id -u):$(id -g) ghcr.io/linkchecker/linkchecker:latest --verbose https://https://www.psikolognevinkeskin.com/

podman run --rm -it ghcr.io/linkchecker/linkchecker:latest --verbose https://www.psikolognevinkeskin.com/ > linkchecker_psyc.txt

Resulting at:

That's it. 53 links in 53 URLs checked. 5 warnings found. 0 errors found.
Stopped checking at 2024-10-19 07:34:09+000 (12 seconds)

FireCrawl Setup

So 53 urls on the site, sounds like the moment to use FireCrawl and get the content.

FireCrawl with Python 📌

Full code is in my repo, here.

You will need the FireCrawl API and a code like below, which scraps a single url.

We dont need the crawling capabilities, as the web is a single pager.

from firecrawl import FirecrawlApp
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Get the API key from environment variable
api_key = os.getenv('FIRECRAWL_API_KEY')
# #app = FirecrawlApp(api_key='fc-yourapi')

if api_key is None:
    raise ValueError("API key not found in environment variables")

# Initialize the FirecrawlApp with the API key
app = FirecrawlApp(api_key=api_key)

# URL to scrape
url = 'https://www.psikolognevinkeskin.com'

# Scrape the data
scraped_data = app.scrape_url(url)

# Write the output to a file using UTF-8 encoding
with open("output_nevin2.txt", "a", encoding="utf-8") as f:
    f.write(str(scraped_data) + "\n")

The magic happened, and now we have json like web information saved in a txt.

Tools for easier JSON 📌

In the Big Data PySpark post, we got to use: jsonformatter, but there are more.

But actually, FireCrawl provides markdown, ready for LLMs:

⚠️
But actually, FireCrawl provides markdown, ready for LLMs.
Tools for easier markdown 📌

And time to translate. Yea, the original site its in turkish, and initially I went the googletranslation way, but the pkg is outdated and i got conflicts with httpx.

Time to try deep_translator. And the test went fine.

ℹ️
FireCrawl can be integrated with: CrewAI, LangChain, Flowise, DifyAI, Zapier…

Proposed Themes

Both probably an overkill, coming from a single pager.

So I proposed this as a landing single pager or this as sth more advance with blog as well, portfolio - Both MIT Licensed.

Testing Proposed Astro Themes 📌
git clone https://github.com/withastro/astro
cd ./examples/portfolio

npm install
npm run dev

Everything worked, so i created this repo for the project

npm run build
npm install -g serve #serve with npm

serve -s dist #http://localhost:3000

And I used Cloudflare together with github, for the demo deployment.

The result was available, after few minutes here: https://morita-web.pages.dev/