My Favourite ways to RAG
If you are already familiar with Python:
And have been tinkering with ways to chat with data context…

These should be some familiar RAG frameworks so far:
We will also see some trendy AI tools that work with RAGs as well:
LangChain
Web Scrapping
LangChain can also help us to chat with website content:
In this case, With Ollama and ChromaDB.
With Persistent ChromaDB and MD
- app.py
- index.html
CSV and PDF
Wherever office work you have, you will most certainly see 2 kind of files, spreadsheets (CSVs) and pdfs.
Thanks to langchain, we can ask information contained in both kind of files:
For PDF’s you can do:
If you are interested, you can edit PDFs with:
Database
Further into data analytics, you will want to chat directly with the content of your databases:
This can be very valuable for real estate applications as seen here
LangChain Agents
But thats a topic beyond RAG, with Agents:

LLamaIndex
A competitor in the RAG space for Langchain is LLamaIndex
I also learnt a lot with the chat over .md
files with LlamaIndex + Mem0.
PandasAI
Conclusions
Remember that there are already some alternatives to RAGS: like MCP or KBLM
For now, my favourite one is still LangChain for its various use cases:
Concepts / AI Tools that are veeery trendy, also as seen here
Reranking models for RAG - As it can be done with LocalAI!
Summarization Techniques: https://python.langchain.com/v0.1/docs/use_cases/summarization/
Hypothetical Documents Embeddings: https://python.langchain.com/v0.1/docs/use_cases/query_analysis/techniques/hyde/
MultiVector Retrieval
ReACT fwk
MLFlow
LangChain, LLamaindex, OpenAI… can all be used together with MLFlow! https://mlflow.org/docs/latest/llms/
What for?
It helps us see how our LLMs are working in production.
From this simple prompts: https://github.com/JAlcocerT/Streamlit-MultiChat/blob/main/Z_Tests/OpenAI/openai_mermaid.py
To This one: https://github.com/JAlcocerT/Streamlit-MultiChat/blob/main/Z_Tests/OpenAI/openai_t2t-o1mini.py
To….
import mlflow
mlflow.set_tracking_uri(uri="http://<host>:<port>")
##pip install mlflow==2.21.3
mlflow server --host 127.0.0.1 --port 8080
from openai import OpenAI
import mlflow
client = OpenAI(api_key="<YOUR_API_KEY>")
# Set MLflow tracking URI
mlflow.set_tracking_uri("<YOUR_TRACKING_URI>")
# Example of loading and using the prompt
prompt = mlflow.load_prompt("prompts:/RealEstate/1")
response = client.chat.completions.create(
messages=[{
"role": "user",
"content": prompt.format(),
}],
model="gpt-4o-mini",
)
print(response.choices[0].message.content)
MLflow Tracing provides LLM observability for various GenAI libraries such as OpenAI, LangChain, LlamaIndex, DSPy, AutoGen, and more.
To enable auto-tracing, call mlflow.xyz.autolog()
before running your models.
Refer to the documentation for customization and manual instrumentation.
LangGraph BigTool
LangGraph is a Python library for building stateful, multi-agent systems and complex conversational workflows.
MIT | Build resilient language agents as graphs.
The LangGraph library enables agent orchestration — offering customizable architectures, long-term memory, and human-in-the-loop to reliably handle complex tasks.
Build LangGraph agents with large numbers of tools
LangFlow
Langflow’s primary strength lies in its visual, low-code environment for building AI applications, especially those leveraging LLMs and LangChain.
docker run -it --rm -p 7860:7860 langflowai/langflow:latest
LangFuse
An Equivalent to LangSmith for observability but MIT and selfhostable
FAQ
MLflow on Databricks: Review how MLflow is integrated into Databricks for tracking machine learning experiments, managing models, and deploying them.
Understand concepts like runs, experiments, and the model registry.
AI Keys
Lately I have been using:
- https://claude.ai/
- https://console.anthropic.com/workbench/
- https://console.groq.com/keys
- https://platform.openai.com/api-keys
GEN AI Techniques
- Fundamentals of Neural Networks: Understand the architecture and training of deep neural networks.
- Generative Adversarial Networks (GANs): Basic understanding of how GANs work for generating synthetic data or other creative outputs.
- Variational Autoencoders (VAEs): Another type of generative model.
- Transformer Networks: Deep dive into the architecture of Transformers, which are the foundation for many state-of-the-art NLP and generative models (e.g., BERT, GPT).
- Large Language Models (LLMs): Understand the capabilities and limitations of LLMs and how they can be applied to HR-related tasks.
See how to run LangGraph or MLFlow
AI Apps Im SelfHosting

Groq YT Summarizer
docker pull ghcr.io/jalcocert/phidata:yt-groq:latest #:v1.1 #:latest
MultiChat
docker pull ghcr.io/jalcocert/streamlit-multichat:latest #:v1.1 #:latest
Local Deep Researcher
MIT | Fully local web research and report writing assistant
- https://github.com/JAlcocerT/Docker/tree/main/AI_Gen/Ollama
- https://fossengineer.com/selfhosting-llms-ollama/
docker run -d --name ollama -p 11434:11434 -v ollama_data:/root/.ollama ollama/ollama
docker exec -it ollama ollama --version
docker exec -it ollama sh
ollama pull deepseek-r1:8b
LLM_PROVIDER=ollama
OLLAMA_BASE_URL="http://localhost:11434" # Ollama service endpoint, defaults to `http://localhost:11434`
LOCAL_LLM=model # the model to use, defaults to `llama3.2` if not set
Video Summarized 📌
The video explores the new fully open source reasoning model, DeepSeek-R1, which represents a new scaling paradigm for Large Language Models (LLMs). The model is trained using a combination of fine-tuning and reinforcement learning, and its training strategy is described in detail. The video also demonstrates the capabilities of the model, including its ability to reason and generate comprehensive summaries.
The Training Strategy of DeepSeek-R1
DeepSeek-R1 uses a combination of fine-tuning and reinforcement learning to produce a strong reasoning model. The first stage involves fine-tuning a strong base chat model, DeepSeek V3, on thousands of chain of thought reasoning examples. The second stage uses reinforcement learning with a rule-based reward function to score the model’s outputs. The model generates 64 different attempts to solve a problem and scores each one, increasing or decreasing the probability of generating tokens based on the score. This process helps the model discover good reasoning patterns. Filtering and Fine-Tuning
The model’s outputs are filtered to get high-quality reasoning traces, which are then used for further fine-tuning. This process helps restore general model capabilities while baking in high-quality reasoning. The final stage involves a second round of reinforcement learning with different rewards, including helpfulness and harm. Results and Distillation
The results show that DeepSeek-R1 is on par with other state-of-the-art reasoning models, including the O Series models from OpenAI. The model is also distilled into smaller models, including a 14 billion parameter model that can run on a laptop. Playing with DeepSeek-R1
The video demonstrates the capabilities of DeepSeek-R1, including its ability to generate summaries and reason about complex topics. The model is shown to be very expressive, emitting think tokens that provide insight into its thought process. The video also explores the use of Json mode, which strips away think tokens and provides a cleaner output. Takeaways
DeepSeek-R1 represents a new scaling paradigm for LLMs, using reinforcement learning to discover good reasoning patterns.
The model's training strategy involves a combination of fine-tuning and reinforcement learning, with filtering and fine-tuning to restore general capabilities.
The model is capable of generating comprehensive summaries and reasoning about complex topics.
The distillation of the model into smaller versions, such as the 14 billion parameter model, makes it possible to run on a laptop.