My Favourite ways to RAG

My Favourite ways to RAG

April 10, 2025

If you are already familiar with Python:

And have been tinkering with ways to chat with data context…

These should be some familiar RAG frameworks so far:

Star History Chart

We will also see some trendy AI tools that work with RAGs as well:

Star History Chart

LangChain

ℹ️
For now, the most popular RAG framework

shields.io Stars

Web Scrapping

LangChain can also help us to chat with website content:

In this case, With Ollama and ChromaDB.

With Persistent ChromaDB and MD

Open in Google Colab

    • app.py
      • index.html
  • alt text

    CSV and PDF

    Wherever office work you have, you will most certainly see 2 kind of files, spreadsheets (CSVs) and pdfs.

    Thanks to langchain, we can ask information contained in both kind of files:

    For PDF’s you can do:

    If you are interested, you can edit PDFs with:

    Database

    Further into data analytics, you will want to chat directly with the content of your databases:

    This can be very valuable for real estate applications as seen here

    LangChain Agents

    But thats a topic beyond RAG, with Agents:

    LLamaIndex

    A competitor in the RAG space for Langchain is LLamaIndex

    I also learnt a lot with the chat over .md files with LlamaIndex + Mem0.

    PandasAI


    Conclusions

    Remember that there are already some alternatives to RAGS: like MCP or KBLM

    For now, my favourite one is still LangChain for its various use cases:

    Open in Google Colab

    Concepts / AI Tools that are veeery trendy, also as seen here

    MLFlow

    LangChain, LLamaindex, OpenAI… can all be used together with MLFlow! https://mlflow.org/docs/latest/llms/

    What for?

    It helps us see how our LLMs are working in production.

    From this simple prompts: https://github.com/JAlcocerT/Streamlit-MultiChat/blob/main/Z_Tests/OpenAI/openai_mermaid.py

    To This one: https://github.com/JAlcocerT/Streamlit-MultiChat/blob/main/Z_Tests/OpenAI/openai_t2t-o1mini.py

    To….

    ℹ️
    …finally, GenAI observavility with MLFlow https://github.com/mlflow/mlflow

    alt text

    import mlflow
    
    mlflow.set_tracking_uri(uri="http://<host>:<port>")
    ##pip install mlflow==2.21.3
    mlflow server --host 127.0.0.1 --port 8080

    alt text

    from openai import OpenAI
    import mlflow
    client = OpenAI(api_key="<YOUR_API_KEY>")
    
    # Set MLflow tracking URI
    mlflow.set_tracking_uri("<YOUR_TRACKING_URI>")
    
    # Example of loading and using the prompt
    prompt = mlflow.load_prompt("prompts:/RealEstate/1")
    response = client.chat.completions.create(
        messages=[{
            "role": "user",
            "content": prompt.format(),
        }],
        model="gpt-4o-mini",
    )
    
    print(response.choices[0].message.content)

    MLflow Tracing provides LLM observability for various GenAI libraries such as OpenAI, LangChain, LlamaIndex, DSPy, AutoGen, and more.

    To enable auto-tracing, call mlflow.xyz.autolog() before running your models.

    Refer to the documentation for customization and manual instrumentation.

    alt text

    LangGraph BigTool

    LangGraph is a Python library for building stateful, multi-agent systems and complex conversational workflows.

    MIT | Build resilient language agents as graphs.

    The LangGraph library enables agent orchestration — offering customizable architectures, long-term memory, and human-in-the-loop to reliably handle complex tasks.

    ℹ️
    It provides a more programmatic and flexible way to define the interactions and state transitions between multiple agents or steps in a sophisticated AI application.

    Build LangGraph agents with large numbers of tools

    LangFlow

    Langflow’s primary strength lies in its visual, low-code environment for building AI applications, especially those leveraging LLMs and LangChain.

    ℹ️
    It simplifies the creation of complex multi-agent/RAG LangChain workflows via UI
    docker run -it --rm -p 7860:7860 langflowai/langflow:latest

    LangFlow Store

    LangFuse

    An Equivalent to LangSmith for observability but MIT and selfhostable


    FAQ

    MLflow on Databricks: Review how MLflow is integrated into Databricks for tracking machine learning experiments, managing models, and deploying them.

    Understand concepts like runs, experiments, and the model registry.

    AI Keys

    Lately I have been using:

    GEN AI Techniques

    • Fundamentals of Neural Networks: Understand the architecture and training of deep neural networks.
    • Generative Adversarial Networks (GANs): Basic understanding of how GANs work for generating synthetic data or other creative outputs.
    • Variational Autoencoders (VAEs): Another type of generative model.
    • Transformer Networks: Deep dive into the architecture of Transformers, which are the foundation for many state-of-the-art NLP and generative models (e.g., BERT, GPT).
    • Large Language Models (LLMs): Understand the capabilities and limitations of LLMs and how they can be applied to HR-related tasks.

    See how to run LangGraph or MLFlow

    AI Apps Im SelfHosting

    Groq YT Summarizer

    docker pull ghcr.io/jalcocert/phidata:yt-groq:latest #:v1.1  #:latest

    MultiChat

    docker pull ghcr.io/jalcocert/streamlit-multichat:latest #:v1.1  #:latest

    Local Deep Researcher

    MIT | Fully local web research and report writing assistant

    
    docker run -d --name ollama -p 11434:11434 -v ollama_data:/root/.ollama ollama/ollama
    
    docker exec -it ollama ollama --version
    docker exec -it ollama sh
    
    ollama pull deepseek-r1:8b
    LLM_PROVIDER=ollama
    OLLAMA_BASE_URL="http://localhost:11434" # Ollama service endpoint, defaults to `http://localhost:11434` 
    LOCAL_LLM=model # the model to use, defaults to `llama3.2` if not set
    Video Summarized 📌

    The video explores the new fully open source reasoning model, DeepSeek-R1, which represents a new scaling paradigm for Large Language Models (LLMs). The model is trained using a combination of fine-tuning and reinforcement learning, and its training strategy is described in detail. The video also demonstrates the capabilities of the model, including its ability to reason and generate comprehensive summaries.

    The Training Strategy of DeepSeek-R1

    DeepSeek-R1 uses a combination of fine-tuning and reinforcement learning to produce a strong reasoning model. The first stage involves fine-tuning a strong base chat model, DeepSeek V3, on thousands of chain of thought reasoning examples. The second stage uses reinforcement learning with a rule-based reward function to score the model’s outputs. The model generates 64 different attempts to solve a problem and scores each one, increasing or decreasing the probability of generating tokens based on the score. This process helps the model discover good reasoning patterns. Filtering and Fine-Tuning

    The model’s outputs are filtered to get high-quality reasoning traces, which are then used for further fine-tuning. This process helps restore general model capabilities while baking in high-quality reasoning. The final stage involves a second round of reinforcement learning with different rewards, including helpfulness and harm. Results and Distillation

    The results show that DeepSeek-R1 is on par with other state-of-the-art reasoning models, including the O Series models from OpenAI. The model is also distilled into smaller models, including a 14 billion parameter model that can run on a laptop. Playing with DeepSeek-R1

    The video demonstrates the capabilities of DeepSeek-R1, including its ability to generate summaries and reason about complex topics. The model is shown to be very expressive, emitting think tokens that provide insight into its thought process. The video also explores the use of Json mode, which strips away think tokens and provides a cleaner output. Takeaways

    DeepSeek-R1 represents a new scaling paradigm for LLMs, using reinforcement learning to discover good reasoning patterns.
    The model's training strategy involves a combination of fine-tuning and reinforcement learning, with filtering and fine-tuning to restore general capabilities.
    The model is capable of generating comprehensive summaries and reasoning about complex topics.
    The distillation of the model into smaller versions, such as the 14 billion parameter model, makes it possible to run on a laptop.