LLMs on Linux

Generative AI and particularly LLMs are taking over.

You can Get LLMs Running in your personal computer or in big servers just for you or whoever you want to give access.

To get the most popular commercial trends: https://theresanaiforthat.com/most-saved/

Interfaces

VectorDBs

When you are using embedding models to give LLMs context about your files, this is where that knowledge goes.

And there are many Vector DBs that you can use with Linux

All of this tech will work in Linux and with just CPU, if you dont have a GPU handy.


FAQ

Can I use LLMs to Code?

Yes, there are many ways to replace Github Copilot for Free:

Choosing the Right Model

LLM Quantization
  • GPTQ quantization, a state-of-the-art method featured in research papers, offers minimal performance loss compared to previous techniques. It’s most efficient on NVIDIA GPUs when the model fits entirely in VRAM.
  • GGML, a machine learning library by Georgi Gerganov (who also developed llama.cpp for running local LLMs on Mac), performs best on Apple or Intel hardware.

Thanks: https://aituts.com/local-llms/#Which_Quantization

Which LLMs are Trending?

You can always check the LLM’s Leaderboards

Where to host in the Cloud?

If you need big GPU power, you can always try https://www.runpod.io/gpu-instance/pricing and similar services.

Using HuggingFace for LLMs

You might be Wondering

Which Neuronal Network are we using?
  1. RNN (Recurrent Neural Network)
  • Typical Use: RNNs are used for sequential data where the context and order of data points are crucial. Common applications include time-series data analysis, natural language processing (NLP), and speech recognition.
  • Characteristics:
    • RNNs possess a memory-like feature, allowing them to retain information about previous computations.
    • This memory feature enables RNNs to process sequences of data effectively, providing them with a sense of ’time’ or sequence order.
  1. CNN (Convolutional Neural Network)
  • Typical Use: CNNs are primarily used for tasks involving spatial data, such as image and video recognition, image classification, and object detection. They are also utilized in some NLP tasks.
  • Characteristics:
    • CNNs consist of convolutional layers, pooling layers, and fully connected layers.
    • The convolutional layers learn spatial hierarchies of features adaptively from input images, making CNNs particularly effective in tasks that require pattern recognition in spatial data.
  1. Transformers
  • Typical Use: Transformers are used in tasks that involve processing entire sequences of data simultaneously, such as language modeling and translation.
  • Characteristics:
    • Unlike traditional neural networks that process data sequentially, transformers can handle entire sequences at once.
    • This architecture allows transformers to weigh the importance of different parts of the input data differently, enhancing efficiency and effectiveness in handling complex sequential tasks.

Mixed of Experts is an approach in machine learning where a model consists of numerous sub-models (referred to as “experts”). Each expert specializes in handling different types of data or tasks. The main idea is to route different inputs to the most relevant experts to handle specific tasks more efficiently and effectively.

More about MoE LLMs

For example, some experts might be better at understanding technical jargon, while others might excel at creative writing or conversational language.

GPT4 is an example of MoE. But also Mixtral - Which you download from HF or this and run for Free locally with Ollama:

ollama run mixtral:8x7b #https://mistral.ai/news/mixtral-of-experts/

You can also try Solar 10.7B to compare these MoE’s:

ollama run solar:10.7b #https://ollama.ai/library/solar/tags

You can also run it in Google Colab: https://www.youtube.com/watch?v=ZyFlySElG1U

What it is a RAG

RAG, which stands for “Retrieval-Augmented Generation” is a methodology used in the development of advanced natural language processing (NLP) systems, particularly in the context of large language models (LLMs)

RAG is particularly useful for tasks that require a blend of understanding context, generating coherent responses, and incorporating up-to-date or specific factual information, such as in question-answering systems or chatbots.