LLMs on Linux

Generative AI and particularly LLMs are taking over.

You can Get LLMs Running in your personal computer or in big servers just for you or whoever you want to give access.

Interfaces for LLMs

You might be Wondering

Which Neuronal Network are we using?

RNN (Recurrent Neural Network)

Typical Use: RNNs are used for sequential data where the context and order of data points are crucial. Common applications include time-series data analysis, natural language processing (NLP), and speech recognition.
Characteristics:
- RNNs possess a memory-like feature, allowing them to retain information about previous computations.
- This memory feature enables RNNs to process sequences of data effectively, providing them with a sense of ’time’ or sequence order.

CNN (Convolutional Neural Network)

Typical Use: CNNs are primarily used for tasks involving spatial data, such as image and video recognition, image classification, and object detection. They are also utilized in some NLP tasks.
Characteristics:
- CNNs consist of convolutional layers, pooling layers, and fully connected layers.
- The convolutional layers learn spatial hierarchies of features adaptively from input images, making CNNs particularly effective in tasks that require pattern recognition in spatial data.

Transformers

Typical Use: Transformers are used in tasks that involve processing entire sequences of data simultaneously, such as language modeling and translation.
Characteristics:
- Unlike traditional neural networks that process data sequentially, transformers can handle entire sequences at once.
- This architecture allows transformers to weigh the importance of different parts of the input data differently, enhancing efficiency and effectiveness in handling complex sequential tasks.

Mixed of Experts is an approach in machine learning where a model consists of numerous sub-models (referred to as “experts”).

Each expert specializes in handling different types of data or tasks.

The main idea is to route different inputs to the most relevant experts to handle specific tasks more efficiently and effectively.

More about MoE LLMs

For example, some experts might be better at understanding technical jargon, while others might excel at creative writing or conversational language.

GPT4 is an example of MoE. But also Mixtral - Which you download from HF or this and run for Free locally with Ollama:

ollama run mixtral:8x7b #https://mistral.ai/news/mixtral-of-experts/

You can also try Solar 10.7B to compare these MoE’s:

ollama run solar:10.7b #https://ollama.ai/library/solar/tags

You can also run it in Google Colab: https://www.youtube.com/watch?v=ZyFlySElG1U

What it is a RAG

RAG, which stands for “Retrieval-Augmented Generation” is a methodology used in the development of advanced natural language processing (NLP) systems, particularly in the context of large language models (LLMs)

RAG is particularly useful for tasks that require a blend of understanding context, generating coherent responses, and incorporating up-to-date or specific factual information, such as in question-answering systems or chatbots.

Cloud SelfHosting