LLMs on Linux
Generative AI and particularly LLMs are taking over.
You can Get LLMs Running in your personal computer or in big servers just for you or whoever you want to give access.
To get the most popular commercial trends: https://theresanaiforthat.com/most-saved/
Interfaces
- Others: LibreChat, Autogen + AutogenStudio https://microsoft.github.io/autogen/blog/2023/12/01/AutoGenStudio/ or Quivir with great docs or LocalGPT.
- https://github.com/khoj-ai/khoj
VectorDBs
When you are using embedding models to give LLMs context about your files, this is where that knowledge goes.
And there are many Vector DBs that you can use with Linux
All of this tech will work in Linux and with just CPU, if you dont have a GPU handy.
FAQ
Can I use LLMs to Code?
Yes, there are many ways to replace Github Copilot for Free:
- Tabby
- LLama Coder in a vscode extension
- Others: Bito, Codeium, or Adrenaline
Choosing the Right Model
LLM Quantization
- GPTQ quantization, a state-of-the-art method featured in research papers, offers minimal performance loss compared to previous techniques. It’s most efficient on NVIDIA GPUs when the model fits entirely in VRAM.
- GGML, a machine learning library by Georgi Gerganov (who also developed llama.cpp for running local LLMs on Mac), performs best on Apple or Intel hardware.
Which LLMs are Trending?
You can always check the LLM’s Leaderboards
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
With ELO Rating: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard
Examples: use them also with GPT4All or TextGenWebUI
- https://huggingface.co/TheBloke/Llama-2-13B-Chat-fp16/tree/main
- https://huggingface.co/docs/transformers/main/model_doc/mpt
- And this one you can train it and use commercially: https://www.mosaicml.com/training
You can also check this repository: https://github.com/sindresorhus/awesome-chatgpt and https://github.com/f/awesome-chatgpt-prompts
Where to host in the Cloud?
If you need big GPU power, you can always try https://www.runpod.io/gpu-instance/pricing and similar services.
Using HuggingFace for LLMs
- https://huggingface.co/spaces
- https://www.youtube.com/watch?v=_Ua6065p-Cw
- https://www.youtube.com/watch?v=_Ua6065p-Cw
You might be Wondering
Which Neuronal Network are we using?
- RNN (Recurrent Neural Network)
- Typical Use: RNNs are used for sequential data where the context and order of data points are crucial. Common applications include time-series data analysis, natural language processing (NLP), and speech recognition.
- Characteristics:
- RNNs possess a memory-like feature, allowing them to retain information about previous computations.
- This memory feature enables RNNs to process sequences of data effectively, providing them with a sense of ’time’ or sequence order.
- CNN (Convolutional Neural Network)
- Typical Use: CNNs are primarily used for tasks involving spatial data, such as image and video recognition, image classification, and object detection. They are also utilized in some NLP tasks.
- Characteristics:
- CNNs consist of convolutional layers, pooling layers, and fully connected layers.
- The convolutional layers learn spatial hierarchies of features adaptively from input images, making CNNs particularly effective in tasks that require pattern recognition in spatial data.
- Transformers
- Typical Use: Transformers are used in tasks that involve processing entire sequences of data simultaneously, such as language modeling and translation.
- Characteristics:
- Unlike traditional neural networks that process data sequentially, transformers can handle entire sequences at once.
- This architecture allows transformers to weigh the importance of different parts of the input data differently, enhancing efficiency and effectiveness in handling complex sequential tasks.
Mixed of Experts is an approach in machine learning where a model consists of numerous sub-models (referred to as “experts”). Each expert specializes in handling different types of data or tasks. The main idea is to route different inputs to the most relevant experts to handle specific tasks more efficiently and effectively.
More about MoE LLMs
For example, some experts might be better at understanding technical jargon, while others might excel at creative writing or conversational language.
GPT4 is an example of MoE. But also Mixtral - Which you download from HF or this and run for Free locally with Ollama:
ollama run mixtral:8x7b #https://mistral.ai/news/mixtral-of-experts/
You can also try Solar 10.7B to compare these MoE’s:
ollama run solar:10.7b #https://ollama.ai/library/solar/tags
You can also run it in Google Colab: https://www.youtube.com/watch?v=ZyFlySElG1U
What it is a RAG
RAG, which stands for “Retrieval-Augmented Generation” is a methodology used in the development of advanced natural language processing (NLP) systems, particularly in the context of large language models (LLMs)
RAG is particularly useful for tasks that require a blend of understanding context, generating coherent responses, and incorporating up-to-date or specific factual information, such as in question-answering systems or chatbots.