[Recap] on AI Audio. Ideating Speech rAIter micro SaaS
Tl;DR
Doing TTS/S2T with streamlit and the st speechraiter was fun.
Now its time to build sth similar with FastAPI or Flask.
You can get started with this kind of project like so.
Plot twist: I ended app with achieving a different scope with this one :)
+++ Fireflies MoM
Intro
Recently I heard about myminutes.ai
so summarize meetings (with speaker recognition!)
As you can imagine, there are more tools to help on meetings…
Tools like this one should help you get proper MoM (minutes of meetings)
What can we build around audio/speech?
Do you imagine practicing job interviews in front of an ai?
Some kind of way to LandThatJob
Or practicing that very important presentation with a SpeechPractice service.
The Speech Rater Stack
Previously I made a speech rater PoC streamlit version:

Choosing from Streamlit to (Flask vs FastAPI) π
I just went forward with cursor and fastAPI:
And quickly made that simple UI, connected to OpenAI TTS/S2T capabilities.
With such architecture, see mermaid
As you can see on the video, where I show how it works while doing BiP, this already helps with my quick youtube video creation.
Hey, what about the speech rater stuff?
Simple md Editor
Later, on I added simple markdown edition capabilities (there were few candidates)
- Monaco Editor (VS Code Editor) - RECOMMENDED (and this is the one cursor went for, enough for a quick edit) β Pros: Full VS Code experience, syntax highlighting, IntelliSense, built-in markdown preview Cons: Larger bundle size (~2MB) Best for: Professional editing experience
CodeMirror 6 - LIGHTWEIGHT β Pros: Lightweight, fast, good markdown support, customizable Cons: Less features than Monaco Best for: Balanced performance and features
SimpleMDE (Markdown Editor) - SIMPLE Pros: Very lightweight, live preview, easy to use Cons: Less advanced features Best for: Simple editing needs
Toast UI Editor - MODERN Pros: WYSIWYG + markdown, good mobile support Cons: Medium bundle size Best for: User-friendly editing
Thanks to the implemented monaco editor, we can just quickly tweak the content of the transcript before saving the .md
The FastAPI Speech Rater
I wanted to combine finally FastAPI (BE) x SQLITE for simple user management x A cool Astro SSG Theme
because…
How could I not try and astro theme…
MIT | Idol is an elegant landing page template for micro SaaS products built with AstroJS & Skeleton CSS
git clone https://github.com/LaB-CH3/Astro-idol
#npm run dev -- --host 0.0.0.0 --port 4321 #http://192.168.1.11:4321/
After asking to Cursor to connect the astro theme with FastAPI and make login possible via sqlite…
This happened:
# Start both servers
make dev-full
#make docker-dev-build
make docker-dev-up # Start both servers in containers FAST and ASTRO working together!!!
#cd /home/jalcocert/Desktop/py-speech-rater/fastapi-speech-rater && sqlite3 ./users.db ".schema users"
# Check all users
sqlite3 ./users.db "SELECT id, email, first_name, last_name, created_at FROM users;"
# Check specific user by email
sqlite3 ./users.db "SELECT * FROM users WHERE email = 'test@example.com';"
# Count total users
sqlite3 ./users.db "SELECT COUNT(*) FROM users;"
#sqlite3 ./users.db
#.tables
#SELECT id, email, first_name, last_name, created_at FROM users;
Im still impressed on sqlite greatness!
And I got to try also ChartDB and DBGate.
docker run -e OPENAI_API_KEY=<YOUR_OPEN_AI_KEY> -p 8080:80 ghcr.io/chartdb/chartdb:latest
# sqlite3 <database_file_path> #sqlite3 ./users.db #sqlite3 ./stock_cache.db
# .dump > <output_file_path>
#example
#sqlite3 ./stock_cache.db ".dump" > schema_export.sql && cat schema_export.sql
The setup even works with container thx to this compose
Conclusions
That’s it: this gave me a new cooler youtube workflow (for the audio part)
This simple FastAPI recorder and transcript web app already helps me.
I got to see how FastAPI x Astro x sqlite works
Now I can try to do those yt tech videos I wanted to do this year.
Just recording with OBS, cutting quickly with KDEnlive and recording my audio with audacity.
Then it gets uploaded into this new py-speech-rater
and we get the voice via Onyx thx to OpenAI ST2 & TTS :)
Its just faster than doing audios same via CLI:
- https://github.com/JAlcocerT/DataInMotion/blob/main/OpenAI-Audio/openai-tts.py
- https://github.com/JAlcocerT/Streamlit-MultiChat/tree/main/Z_Tests/OpenAI
What can be next from here?
Considering that FASTAPI and Astro can speak perfectly…
Making admin panels / dashboards / data apps ( displaying via chartjs ) with this stack does not seem that complicated anymore…
And one panel like that to rank users engagement with the app does not seem a bad idea at all :)
See the
./fastapi-speech-rater
folder that contains those. And the related tech doc with the system’s architecture
I dont see any reason why not shipping micro SaaS faster, like:
- Preparing Interviews with AI
I saw something interesting at interviewsby.ai
, where you upload your resume for feedback
- Preparing/Rating Speech with AI…
All these would need is one of those MIT Astro Micro Saas Themes + Proper email validation (logtoJS/FirebaseAuth + csr?) + whatever backend logic via fastAPI/pb/any other
FAQ
How to get started and build a Speech Rater with AI?
git init
git branch -m main
git config user.name
git config --global user.name "JAlcocerT"
git config --global user.name
git add .
git commit -m "Initial commit: Python Speech Rater project with OpenAI TTS/S2T"
#sudo apt install gh
gh auth login
gh repo create reponame --private --source=. --remote=origin --push
Tools for Meetings
Both Otter.ai and Fireflies.ai support recording audio on Android apps and provide full access to transcripts and related meeting content via their web applications.
- Otter.ai Android app lets you record meetings, voice memos, and in-person conversations with real-time transcription. All recordings and transcripts automatically sync to Otter’s cloud and can be accessed, searched, edited, and shared on Otterβs web app. Users can pause/resume recordings and rename conversations.
It supports live transcription during recordings and exports in multiple formats like TXT, DOCX, and PDF.
- Fireflies.ai Android app also records audio and meetings directly from mobile. Users can upload audio/video files from the app for transcription and AI meeting analysis. The transcripts, summaries, and audio files are fully accessible in the Fireflies web dashboard, with features like searching transcripts, creating soundbites, and interacting with their AI assistant AskFred from both mobile and web.
It syncs audio and transcript content across devices seamlessly.
So it convinced me to try those 7d free: