[Recap] on AI Audio. Ideating Speech rAIter micro SaaS

[Recap] on AI Audio. Ideating Speech rAIter micro SaaS

September 12, 2025

Tl;DR

Doing TTS/S2T with streamlit and the st speechraiter was fun.

Now its time to build sth similar with FastAPI or Flask.

You can get started with this kind of project like so.

Plot twist: I ended app with achieving a different scope with this one :)

+++ Fireflies MoM

Intro

Recently I heard about myminutes.ai so summarize meetings (with speaker recognition!)

As you can imagine, there are more tools to help on meetings…

Tools like this one should help you get proper MoM (minutes of meetings)

What can we build around audio/speech?

Do you imagine practicing job interviews in front of an ai?

Some kind of way to LandThatJob

Or practicing that very important presentation with a SpeechPractice service.

The Speech Rater Stack

Previously I made a speech rater PoC streamlit version:

Choosing from Streamlit to (Flask vs FastAPI) πŸ“Œ
I just went with FastAPI as per gemini and this md doc.

I just went forward with cursor and fastAPI:

Fast API

And quickly made that simple UI, connected to OpenAI TTS/S2T capabilities.

With such architecture, see mermaid

As you can see on the video, where I show how it works while doing BiP, this already helps with my quick youtube video creation.

⚠️
The audio record does not work oh phones. Probably due to permissions.

Hey, what about the speech rater stuff?

Simple md Editor

Later, on I added simple markdown edition capabilities (there were few candidates)

  1. Monaco Editor (VS Code Editor) - RECOMMENDED (and this is the one cursor went for, enough for a quick edit) ⭐ Pros: Full VS Code experience, syntax highlighting, IntelliSense, built-in markdown preview Cons: Larger bundle size (~2MB) Best for: Professional editing experience

Monaco Editor inside a FastAPI powered audio WebAPP

  1. CodeMirror 6 - LIGHTWEIGHT ⭐ Pros: Lightweight, fast, good markdown support, customizable Cons: Less features than Monaco Best for: Balanced performance and features

  2. SimpleMDE (Markdown Editor) - SIMPLE Pros: Very lightweight, live preview, easy to use Cons: Less advanced features Best for: Simple editing needs

  3. Toast UI Editor - MODERN Pros: WYSIWYG + markdown, good mobile support Cons: Medium bundle size Best for: User-friendly editing

ℹ️
A wysiwyg markdown editor post is coming soon

Thanks to the implemented monaco editor, we can just quickly tweak the content of the transcript before saving the .md

The FastAPI Speech Rater

I wanted to combine finally FastAPI (BE) x SQLITE for simple user management x A cool Astro SSG Theme

because…

How could I not try and astro theme…

MIT | Idol is an elegant landing page template for micro SaaS products built with AstroJS & Skeleton CSS

git clone https://github.com/LaB-CH3/Astro-idol
#npm run dev -- --host 0.0.0.0 --port 4321 #http://192.168.1.11:4321/

After asking to Cursor to connect the astro theme with FastAPI and make login possible via sqlite…

FastAPI x signup integrated with astro

This happened:

# Start both servers
make dev-full

#make docker-dev-build
make docker-dev-up  # Start both servers in containers FAST and ASTRO working together!!!

#cd /home/jalcocert/Desktop/py-speech-rater/fastapi-speech-rater && sqlite3 ./users.db ".schema users"
# Check all users
sqlite3 ./users.db "SELECT id, email, first_name, last_name, created_at FROM users;"

# Check specific user by email
sqlite3 ./users.db "SELECT * FROM users WHERE email = 'test@example.com';"

# Count total users
sqlite3 ./users.db "SELECT COUNT(*) FROM users;"

#sqlite3 ./users.db
#.tables
#SELECT id, email, first_name, last_name, created_at FROM users;

Im still impressed on sqlite greatness!

And I got to try also ChartDB and DBGate.

docker run -e OPENAI_API_KEY=<YOUR_OPEN_AI_KEY> -p 8080:80 ghcr.io/chartdb/chartdb:latest
# sqlite3 <database_file_path> #sqlite3 ./users.db #sqlite3 ./stock_cache.db
# .dump > <output_file_path> 
#example
#sqlite3 ./stock_cache.db ".dump" > schema_export.sql && cat schema_export.sql

alt text

alt text

alt text

The setup even works with container thx to this compose

Fast API x Astro Connected

⚠️
This is a sample quick setup with a lot of auth to do’s, like httpcookie setup

Conclusions

That’s it: this gave me a new cooler youtube workflow (for the audio part)

This simple FastAPI recorder and transcript web app already helps me.

I got to see how FastAPI x Astro x sqlite works

Now I can try to do those yt tech videos I wanted to do this year.

Just recording with OBS, cutting quickly with KDEnlive and recording my audio with audacity.

Then it gets uploaded into this new py-speech-rater and we get the voice via Onyx thx to OpenAI ST2 & TTS :)

ℹ️
So now I got for my yt workflow: OBS -> Audacity -> FastAPI with OpenAI -> KDEnlive -> YT

Its just faster than doing audios same via CLI:

What can be next from here?

Considering that FASTAPI and Astro can speak perfectly…

Making admin panels / dashboards / data apps ( displaying via chartjs ) with this stack does not seem that complicated anymore…

FastAPI x Astro x ChartJS

And one panel like that to rank users engagement with the app does not seem a bad idea at all :)

See the ./fastapi-speech-rater folder that contains those. And the related tech doc with the system’s architecture

ℹ️
The SSG + FastAPI + OpenAI speech rater part, will come on another post. This has been prep work. Maybe a ,serverles’ ChatGPT Clone?

I dont see any reason why not shipping micro SaaS faster, like:

  1. Preparing Interviews with AI

I saw something interesting at interviewsby.ai, where you upload your resume for feedback

  1. Preparing/Rating Speech with AI…

All these would need is one of those MIT Astro Micro Saas Themes + Proper email validation (logtoJS/FirebaseAuth + csr?) + whatever backend logic via fastAPI/pb/any other


FAQ

How to get started and build a Speech Rater with AI?

git init
git branch -m main
git config user.name
git config --global user.name "JAlcocerT"
git config --global user.name
git add .
git commit -m "Initial commit: Python Speech Rater project with OpenAI TTS/S2T"

#sudo apt install gh
gh auth login
gh repo create reponame --private --source=. --remote=origin --push

Tools for Meetings

Both Otter.ai and Fireflies.ai support recording audio on Android apps and provide full access to transcripts and related meeting content via their web applications.

  • Otter.ai Android app lets you record meetings, voice memos, and in-person conversations with real-time transcription. All recordings and transcripts automatically sync to Otter’s cloud and can be accessed, searched, edited, and shared on Otter’s web app. Users can pause/resume recordings and rename conversations.

It supports live transcription during recordings and exports in multiple formats like TXT, DOCX, and PDF.

  • Fireflies.ai Android app also records audio and meetings directly from mobile. Users can upload audio/video files from the app for transcription and AI meeting analysis. The transcripts, summaries, and audio files are fully accessible in the Fireflies web dashboard, with features like searching transcripts, creating soundbites, and interacting with their AI assistant AskFred from both mobile and web.

It syncs audio and transcript content across devices seamlessly.

ℹ️
I decided to give a try to this one, as fireflies seems to support n8n integration: https://guide.fireflies.ai/articles/4758387081-learn-about-n8n-x-fireflies-integration

Fireflies AI has n8n integration

So it convinced me to try those 7d free:

alt text