LLMs
Generative AI and particularly LLMs are taking over.
You can Get LLMs Running in your personal computer or in big servers just for you or whoever you want to give access.
To get the most popular commercial trends: https://theresanaiforthat.com/most-saved/
Interfaces
- Others: LibreChat, Autogen + AutogenStudio https://microsoft.github.io/autogen/blog/2023/12/01/AutoGenStudio/ or Quivir with great docs or LocalGPT.
- https://github.com/khoj-ai/khoj
VectorDBs
When you are using embedding models to give LLMs context about your files, this is where that knowledge goes.
And there are many Vector DBs that you can use with Linux
All of this tech will work in Linux and with just CPU, if you dont have a GPU handy.
FAQ
How to monitor Hardware while using LLMs?
You can setup Netdata with Docker really quick.
It will give you insights on workload, temperatures of your Hardware where you run LLMs.
Can I use LLMs to Code?
Yes, there are many ways to replace Github Copilot for Free:
- Tabby
- LLama Coder in a vscode extension
- Others: Bito, Codeium, or Adrenaline
And to Power my Notes?
- LogSeq + Ollama
- The plugin: https://github.com/omagdy7/ollama-logseq
What is a MoE?
Mixed of Experts is an approach in machine learning where a model consists of numerous sub-models (referred to as “experts”). Each expert specializes in handling different types of data or tasks. The main idea is to route different inputs to the most relevant experts to handle specific tasks more efficiently and effectively.
For example, some experts might be better at understanding technical jargon, while others might excel at creative writing or conversational language.
GPT4 is an example of MoE. But also Mixtral - Which you download from HF or this and run for Free locally with Ollama:
ollama run mixtral:8x7b #https://mistral.ai/news/mixtral-of-experts/
You can also try Solar 10.7B to compare these MoE’s:
ollama run solar:10.7b #https://ollama.ai/library/solar/tags
You can also run it in Google Colab: https://www.youtube.com/watch?v=ZyFlySElG1U
Choosing the Right Model
Quantization
- GPTQ quantization, a state-of-the-art method featured in research papers, offers minimal performance loss compared to previous techniques. It’s most efficient on NVIDIA GPUs when the model fits entirely in VRAM.
- GGML, a machine learning library by Georgi Gerganov (who also developed llama.cpp for running local LLMs on Mac), performs best on Apple or Intel hardware.
Thanks: https://aituts.com/local-llms/#Which_Quantization
Which LLMs are Trending now?
You can always check the LLM’s Leaderboards:
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
With ELO Rating: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard
Examples: use them also with GPT4All or TextGenWebUI
- https://huggingface.co/TheBloke/Llama-2-13B-Chat-fp16/tree/main
- https://huggingface.co/docs/transformers/main/model_doc/mpt
- And this one you can train it and use commercially: https://www.mosaicml.com/training
You can also check this repository: https://github.com/sindresorhus/awesome-chatgpt and https://github.com/f/awesome-chatgpt-prompts
What about Image Generation?
You can find them in Hugging Face:
Stable Difussion: Quick Setup -> https://github.com/AbdBarho/stable-diffusion-webui-docker/wiki/Setup (Thanks to Jim Garage https://www.youtube.com/watch?v=5XHSV56hsJM)
- https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main
- https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/tree/main
- https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/tree/main
- https://huggingface.co/stabilityai/stable-diffusion-2-base
- https://github.com/AUTOMATIC1111/stable-diffusion-webui or https://github.com/vladmandic/automatic
Foocus: https://www.youtube.com/watch?v=zIhODzEVZqg https://github.com/lllyasviel/Fooocus
Speech to IMG?!: https://www.youtube.com/watch?v=IAc-G-enRII
What are other people building? https://civitai.com/
Voice?
Generally, here you can get many ideas: https://github.com/sindresorhus/awesome-whisper
Also, in HF there are already interesting projects.
ecoute (OpenAI API needed)
Meeper (OpenAI API needed)
Bark
Whisper - https://github.com/openai/whisper
Linux Desktop App:
flatpak install flathub net.mkiol.SpeechNote
flatpak run net.mkiol.SpeechNote
Other Interesting AI Tools
What is a RAG?
RAG, which stands for “Retrieval-Augmented Generation” is a methodology used in the development of advanced natural language processing (NLP) systems, particularly in the context of large language models (LLMs)
RAG is particularly useful for tasks that require a blend of understanding context, generating coherent responses, and incorporating up-to-date or specific factual information, such as in question-answering systems or chatbots.
Do I need to Know programming to use LLMs?
You dont have to be a developer to get to use LLMs.
Mostly we will be using frameworks that provide a level of abstraction to the real code behind the scenes.
It would be definitely beneficial if you are familiar with Python if you want to try Cutting-Edge and Free AI or at least to know how to manage Python Dependencies.
Prompting
Where to host in the Cloud?
If you need big GPU power, you can always try https://www.runpod.io/gpu-instance/pricing and similar services.
Using HuggingFace for LLMs
- https://huggingface.co/spaces
- https://www.youtube.com/watch?v=_Ua6065p-Cw
- https://www.youtube.com/watch?v=_Ua6065p-Cw
You might be Wondering
Awsome Question to Start! 🚀
- RNN (Recurrent Neural Network):
Typical Use: RNNs are typically used for sequential data where the order and context of the data points are important. They are well-suited for time-series data, natural language processing (NLP), speech recognition, and other tasks where data points are interdependent. Characteristics: RNNs have a memory-like feature that captures information about what has been calculated so far, essentially allowing them to have a sense of ’time’ or sequence. This makes them ideal for processing sequences of data like sentences or time series.
- CNN (Convolutional Neural Network):
Typical Use: CNNs are predominantly used for image and video recognition, image classification, object detection, and similar tasks that require the model to recognize patterns in spatial data. They are also used in some NLP tasks, although to a lesser extent than RNNs and Transformers. Characteristics: CNNs use convolutional layers, pooling layers, and fully connected layers. The convolutional layers automatically and adaptively learn spatial hierarchies of features from input images. This makes them particularly good at tasks like image recognition where understanding spatial hierarchy in pixels is crucial.
- Transformers, is another type of neural network architecture. These mechanisms allow the model to weigh the importance of different parts of the input data differently.
Unlike traditional neural networks that process data sequentially (like RNNs and LSTMs), transformers can process entire sequences of data simultaneously, making them highly efficient for tasks like language modeling and translation.
How to use HF?
In HuggingFace you will find really cool and Open AI Projects to try out:
- Image to Code: https://huggingface.co/spaces/HuggingFaceM4/screenshot2html
- Audio to text: https://huggingface.co/spaces/sanchit-gandhi/whisper-jax