My [AI] Youtube Workflow with Whisper

Blogs

November 7, 2024

How Im slowly getting better at publishing to Youtube and creating content?

It all started with the DJI OA5 Pro

…And continued with the x300 as a video creation center…

If you need help summarizing YT Videos content you can try:

ℹ️

With the YT Groq Summarizer by PhiData. Or with the AIssistant YT Summarizer.

My Initial Workflow

It all started with…

OBS + OpenAI [Whisper]

OBS to record video.
Then create audio with OpenAI API: https://platform.openai.com/api-keys

Sample Script OpenAI Whisper ↗

OpenAI API Audio creation for YT Videos | Example 📌

import openai

#https://platform.openai.com/api-keys
#export OPENAI_API_KEY='sk-...' # on Linux/Mac

#cd ai-openai
#python3 texttoaudio.py

##################### NETDATA OLLAMA YT ###################################

#After you will watch this video, you will know how to Monitor your server while performing AI workloads, all locally, this time thanks to the NetData.
#Before moving on, remember that you will need Docker installed on your machine, together with Portainer to follow along. I will leave the commands below if you need to Setup those.
#Also, make sure you have some Local AI project to test NetData out. For example you can follow along the Ollama installation and pull some Free Model, like Orca-mini.
#Now, lets just login to Portainer and create a new Stack. And yes, you will find this information in the video description. 
#And today, we just need one artifact, the docker-compose, that will configure and spin up our Netdata instance thanks to the already existing Docker container. So yea, today no python scripts, no docker builds
#Lets just create our stack with Portainer and lets wait for the service to be deployed. The idea is that we will have historical monitoring data for our laptops when we are running LLMs locally. But same process applies if you want to run them on big servers and monitor them.
#And now, the fun part. Lets use Ollama, which we already covered how to install locally here. Lets make it use Mistral and ask something to the model. You can see how the increase of temperature is being registered
#Let me know in the comments which workloads you cant wait to monitor with Netdata. Consider giving a like to the video if it was helpful!And stay tuned for more.

# timeline
#     title Monitoring AI Infrastructure with NetData
#     Pre-Requisites : Get Docker 🐋
#                    : Install Portainer
#                    : Have Ollama Ready with a local LLM
#     SelfHosting Netdata : Using the Docker-Compose Stack
#     NetData + Ollama : Overview to Netdata
#                      : Running prompts with Ollama
#                      : Checking System Load and Temps 🔥

client = openai.OpenAI()
speech_file_path = "Netdata10.mp3"
response = client.audio.speech.create(
    model="tts-1",
    voice="onyx",
    input="Let me know in the comments which workloads you cant wait to monitor with Netdata. Consider giving a like to the video if it was helpful!And stay tuned for more."
)
response.stream_to_file(speech_file_path)

ℹ️

The initial scripts I used are at YT_Audios. The more advance version of handling audio with AI is at AIssistant tests

Updating My YT Video WF [AI Powered]

OBSStudio to record + I record myself commenting the video
Then, that .mp4 gets a transcript, which is passed to the OpenAI API to generate an AI voice

FC Meme

You can be really Pro on a AI Powered YT Channel

A friend gave me this cool idea [AI & Audio Transcription] 📌

It uses the Whisper model to make the audio2text conversion/transcription:

##1 abre archivo de audio transcribe y guarda como texto
# metele la API Key y el path tuyo

from openai import OpenAI

# Initialize the OpenAI client with your API key
client = OpenAI(api_key='')
 
# Path to your audio file
audio_file_path = r'C:\Users\diazc\OneDrive\Escritorio\speech-analyzer-app\audio_test_api.wav'
 
# Open the audio file
with open(audio_file_path, "rb") as audio_file:
    # Transcribe the audio using Whisper model
    transcription = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file
    )

    # Print the transcription text
    print(transcription.text)

# Extract the transcription text
transcription_text = transcription.text
 
# Step 2: Save the transcription to a .txt file
output_file_path = 'audio.txt'

with open(output_file_path, 'w') as text_file:
    text_file.write(transcription_text)

# Step 3: Print confirmation
print(f"Transcription saved to {output_file_path}")

###Python Audio #2 abre archivo de texto y lo manda a chatGPT a que lo analice

import openai
# Initialize the OpenAI client with your API key
openai.api_key = ''


# Path to the file containing the speech text (audio.txt)
file_path = r'C:\Users\diazc\OneDrive\Escritorio\speech-analyzer-app\audio.txt'

# Step 1: Read the content of the file
with open(file_path, 'r') as file:
    file_content = file.read() 

# Step 2: Create the prompt for GPT-4 using the content from the file
analysis_prompt = f"You are an expert in public speaking. Analyze this speech: {file_content}"

# Step 3: Send the prompt to GPT-4 for analysis
response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": analysis_prompt}
    ]
)

# Step 4: Extract the analysis from the response
analysis = response.choices[0].message.content

# Step 5: Print the analysis
print("Speech Analysis: ")
print(analysis)

ℹ️

For a transcription with OpenAI + Whisper, you have this script For Text to Audio - using tts1, this script

It is quite clear that Combining those with a the new Audio feature on Streamlit can create sth cool.

#streamlit run test_audioinput.py
#streamlit run test_audioinput.py --server.address=0.0.0.0
streamlit run test_audioinput.py --server.address=0.0.0.0 --server.port=8501

ℹ️

But thats the story of the SpeechRAIter Project.

And maybe a future…podcast creator with AI and the AstroPod Theme?

Quick Vlogs as a Code

Using ffmpeg or Python MoviePy

Rencoding? save time not doing it. If the parts have similar resolution/fps/codecs and other features, you dont need to.

ℹ️

With this Script you can bundle together videos with MoviePy on W11/Linux

You can also try to silence, include another audio as background…

Conclusions

Thanks to a couple of scripts and some CLI with ffmpeg, now everything flows much faster:

Script OpenAI+Whisper T2S Transcription ↗ Script OpenAI T2T ↗

Youtube Video Edition CLIs

Source Code on Github

Next Steps

Data Driven Videos with Streamlit

Find interesting data - Create an animation with Streamlit - Record with OBS - Upload to Youtube

ℹ️

As tested during AIssistant - STAutomaticYTVideo

Videos with RemotionJS

This one as an idea to tinker with at some point:

ℹ️

See the VideoEditingWF repo

Other interesting AI Audio stuff

https://github.com/pluja/whishper

WhiSHper 📌

Whishper is an open-source, 100% local audio transcription and subtitling suite with a full-featured web UI.

https://whishper.net/guides/install/

# Get the script
curl -fsSL -o get-whishper.sh https://raw.githubusercontent.com/pluja/whishper/main/get-whishper.sh
# Run it
bash get-whishper.sh

Whisper

Whisper - Speech recognition OpenAI powered | Script with Docker 📌

https://github.com/sindresorhus/awesome-whisper : Awesome list for Whisper — an open-source AI-powered speech recognition system developed by OpenAI
- https://openai.com/research/whisper
- https://github.com/openai/whispers
https://github.com/sindresorhus/awesome-whisper
- https://github.com/ggerganov/whisper.cpp
- https://github.com/serg-plusplus/meeper

services:
  whisper:
    image: python:3.11-slim
    container_name: ai-whisper
    command: tail -f /dev/null
    volumes:
      - ai_whisper:/app
    working_dir: /app  # Set the working directory to /app
    ports:
      - "7865:7865"

volumes:
  ai_whisper:

pip install -U openai-whisper
#pip install git+https://github.com/openai/whisper.git 
#pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

sudo apt update && sudo apt install ffmpeg

Using Whisper:

whisper --help
whisper audio.flac audio.mp3 audio.wav --model medium #--model large

#whisper japanese.wav --language Japanese #transcribe non English
#whisper japanese.wav --language Japanese --task translate #speach to English

bigwav

https://github.com/ggerganov/whisper.cpp/discussions/604

Meeper is your secretary for any in-browser conference 📌

python3 -m venv meepervenv
source meepervenv/bin/activate

git clone git@github.com:serg-plusplus/meeper.git && cd meeper

python -m pip install -r requirements.txt
chmod +x cygwin_cibuildwheel_build.sh
./cygwin_cibuildwheel_build.sh

#deactivate

FAQ

TelePrompter for Youtube Videos

If you have prepared a speech and want to follow it…have a look to:

https://qprompt.app/

flatpak install flathub com.cuperino.qprompt
flatpak run com.cuperino.qprompt

Or a specific Qprompt-teleprompter version:

You just need these commands:

wget https://github.com/Cuperino/QPrompt-Teleprompter/releases/download/v1.1.6/qprompt-v1.1.6-51788eb-linux-gcc-x86_64.AppImage

chmod +x qprompt-v1.1.6-51788eb-linux-gcc-x86_64.AppImage
./qprompt-v1.1.6-51788eb-linux-gcc-x86_64.AppImage

[Review] Asrock x300 as Home Server Streamlit MigrAItion - Making cool Webs from old ones