AI Tools for CLI
Gemini
Gemini models are great: https://ai.google.dev/gemini-api/docs/models#gemini-2.5-flash
Specially their very high context window of 1M tokens
And as of now, they are on top of the LLM Leaderboard: https://lmarena.ai/leaderboard
Querying Gemini was as simple as getting your API Key and doing:
#source .env
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=${GEMINI_API_KEY}" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"contents": [
{
"parts": [
{
"text": "Explain how AI works in a few words"
}
]
}
]
}'
You could also do these queries to Gemini with python
Gemini CLI
npx https://github.com/google-gemini/gemini-cli #one time install
#npm install -g @google/gemini-cli #global install
gemini
gemini --prompt "are you able to be queried with non interactive mode?" -m gemini-2.5-flash --yolo --debug
The flags used for Gemini are:
--yolo
automatically accepts all actions
--debug
shows debug information
--model
allows to specify the model to use
Gemini CLI via API Key https://aistudio.google.com/app/apikey
I had to login via API key as per:
As I could not do it via regular google auth
You could do it as per this issue
- https://console.cloud.google.com/welcome?hl=en&inv=1&invt=Ab1Jlw&project=starlit-advice-464112-j9
- https://console.developers.google.com/apis/api/cloudaicompanion.googleapis.com
- https://console.cloud.google.com/apis/dashboard
#source .env && npx https://github.com/google-gemini/gemini-cli && gemini
export GOOGLE_CLOUD_PROJECT="xxxx" && gemini
Be aware of the costs: https://aistudio.google.com/app/usage
Gemini CLI x MCP
A very interesting feature is the MCP integration https://github.com/google-gemini/gemini-cli/blob/main/docs/cli/configuration.md
As seen here, we can use MCP Tools with windsurf, like Context7.
But how to add it to Gemini CLI?
gemini
/mcp
You will just need a .gemini/settings.json
as per the docs
{
"theme": "Default", // Keep any existing settings
"mcpServers": {
"context7": {
"url": "https://mcp.context7.com/sse"
}
}
}
Once logged in again to GeminiCLI, you will get access to the MCP tools:
See how it works:
Gemini CLI x Git-MCP
What else can be connected to Gemini via MCP?
PRety much anything: https://github.com/punkpeye/awesome-mcp-servers
For example - https://github.com/idosal/git-mcp
Apache v2 | Put an end to code hallucinations! GitMCP is a free, open-source, remote MCP server for any GitHub project
You can add it to Windsurf by: https://gitmcp.io/openai/codex or https://gitmcp.io/google-gemini/gemini-cli
{
"mcpServers": {
"gitmcp": {
"serverUrl": "https://gitmcp.io/google-gemini/gemini-cli"
},
"context7": {
"command": "npx",
"args": ["-y", "@upstash/context7-mcp"]
}
}
}
See how you can use mcp with windsurf:
Gemini CLI x Databases
Logging GeminiCLI
#npx https://github.com/google-gemini/gemini-cli -y && gemini
gemini --debug -p "How do I use the Gemini CLI to log my thinking process?"
Other CLI Tools
These 2 I already have tested and covered on this post.
Codex CLI
If you like OpenAI taste: https://platform.openai.com/docs/guides/tools-image-generation
You can get started as per openai manual
#codex
codex --approval-mode full-auto
#codex --provider openai --model o3-mini --quiet --approval-mode full-auto "$(cat ./prompts/codex-tree-stack-components.md)" > ./Outputs_Model/output-codex-tree-stack-components-plan.json #saved the full reply with errors
Remember that the quiet mode always requires a string with the question already passed!
And use your favourite model: https://platform.openai.com/docs/models Which at this point offer a 200k context Window
You could say that at this point o3 > o1 > 4o / GPT 4.1:
codex --provider openai --model gpt-4.1-mini #32k tokens
codex --provider openai --model gpt-4.1-nano #8k tokens
codex --approval-mode full-auto "create the fanciest todo-list app" --provider ollama
#codex
#/model
#codex -m o3-mini
#codex --provider openai --model o3-mini "Please create a data analysis script that cleans data"
codex --provider openai --model o3-mini --quiet --approval-mode full-auto "$(cat ./prompts/codex-tree-stack-components.md)" > ./Outputs_Model/output-codex-tree-stack-components-plan.json #saved the full reply with errors
Remember that using providers you will get some associated cost. Like: https://platform.openai.com/usage
Claude Code
With Anthropic, you have the magestic Claude 4 Opus - https://www.anthropic.com/claude/opus
npm install -g @anthropic-ai/claude-code
Also with 200K context window.
Claude Task Master
If you are familiar with BRD/PRD and similar project management concept, Claude Task Master is a tool that can create ai driven projects in such a way.
WARP IDE
AIDER
BAML
- https://marketplace.visualstudio.com/items?itemName=Boundary.baml-extension
- https://docs.boundaryml.com/home
- https://gitmcp.io/BoundaryML/baml - This can be added to windsurf as MCP reference about BAML via SSE
- https://gitmcp.io/BoundaryML/baml/chat
ext install Boundary.baml-extension
You might hear about BAML as per its type safe guarantees for LLMs:
When comparing with BAML, you can see several key differences:
- Type Safety: Without BAML: You manually define JSON schemas, no compile-time checking With BAML: Types are checked at compile time with Pydantic
- Configuration vs. Code: Without BAML: Function definitions are mixed with business logic With BAML: Function definitions are in declarative BAML files
- Maintenance: Without BAML: Parameter changes require code updates in multiple places With BAML: Changes in one BAML file propagate to generated code
“Every system in the world should be able to run LLMs, not just Python. Vaibhav Gupta.”
uv init
uv add baml-py
#uv sync
Apache v2 | The AI framework that adds the engineering to prompt engineering (Python/TS/Ruby/Java/C#/Rust/Go compatible)
BAML, which stands for “Basically a Made-up Language,” is an open-source AI framework designed to bring traditional software engineering rigor and best practices to the development of applications that utilize Large Language Models (LLMs).
Essentially, it offers a structured and type-safe way to define and manage how you interact with LLMs, moving beyond simple text prompts to a more robust, function-based approach.
More about BAML 📌
BAML helps you be better with LLMs by addressing several common pain points in LLM application development.
Firstly, it transforms raw LLM prompts into defined “functions” with specific input parameters and expected output types.
This “schema engineering” ensures that your LLM outputs are reliable and consistently formatted, significantly reducing parsing errors and the need for complex error handling.
Secondly, it drastically improves iteration speed with built-in IDE tooling and a “playground” that allows you to visualize and test your prompts rapidly, speeding up development cycles and enabling quicker experimentation with different ideas.
Finally, BAML promotes maintainability and scalability by abstracting away the complexities of integrating with various LLM providers, offering features like model rotation, retry policies, and fallbacks, all while generating type-safe client code for multiple programming languages.
The core of BAML are RUST functions :)
Similarly as numpy goes to C.
Make sure to add baml as a dependency to your virtual environment:
#uv init
uv add baml-py
#uv add pydantic
#uv add typing-extensions
#uv add baml-py pydantic typing-extensions
uv run baml-cli init #https://docs.boundaryml.com/guide/installation-language/python
uv run baml-cli generate #generates ./baml_client python files
Those will produce *.py
files!
BAMl Architectyre
BAML’s architecture is centered around a clear separation of concerns:
BAML Schema (
baml_src/*.baml
files): to be defined by the user, allows for BAML setup- Define the contract/interface
- Specify what functions exist, what data types they use
- Configure which LLM providers to call
Generated Client Code (
baml_client/
folder): its generated automatically by BAML and its used for reliable type safety- Handles all the technical details of calling models correctly
- Manages parsing responses, error handling, retry logic
- Provides type-safe interfaces for your business logic
Your Business Logic (like
plan_enhancement_baml.py
): instead of the typical python script that calls OpenAI API- Only needs to import and use the generated clients
- Focuses purely on solving your business problem
- Remains clean and readable
This architecture means your main Python files can remain focused on business logic, while all the complexity of reliable LLM calling is abstracted away in the generated client code.
Benefits
- Type Safety: Compile-time checking for LLM interactions
- Maintainability: Change models without changing business code
- Testability: Built-in testing framework
- Separation of Concerns: Business logic separate from LLM interaction details
Workflow: see details
- Define your schema in .baml files
- Run baml generate (or npx @boundaryml/baml generate) to create client code
./baml_client
- Import and use the client in your business logic
When you need to change your LLM interface, modify BAML files, re-generate, and the changes propagate
BAML Workflow
The typical development workflow when using BAML follows these steps: https://docs.boundaryml.com/guide/installation-language/python
Define Your Schema in BAML (
baml_src/doc_enhancement.baml
):- Create data structures (input/output classes)
- Define functions with their input/output types
- Configure LLM clients and prompt templates
Generate the Client Code:
npx @boundaryml/baml generate #this one will generate .ts client files
- This creates Pydantic models and API client code in
baml_client/
Write Your Business Logic (
plan_enhancement_baml.py
):- Import the generated client
- Handle file I/O, argument parsing, etc.
- Call the BAML functions and process results
Iterate When Needed:
- If you need to change schemas or prompts, modify the BAML file
- Regenerate the client code
- Update your Python script if necessary
graph TD %% Main execution flow A[When calling plan_enhancement_baml.py] --> B[parse_arguments] B --> C[run_plan_enhancement_baml] %% File imports and usage flow C --> D[Load prompts from files] C --> E[Read documentation file] C --> F[Create EnhancementInput object] F --> G[Call b.EnhanceDocumentation] %% BAML component relationships G --> H[BAML Sync Client] H --> I[OpenAI API] I --> H H --> G G --> J[Process and save results] %% File relationships and imports subgraph "Files and Imports" K[plan_enhancement_baml.py] -.imports.-> L[baml_client/sync_client.py] K -.imports.-> M[baml_client/types.py] N[doc_enhancement.baml] -.generates.-> L N -.generates.-> M K -.reads.-> O[prompts/*.md files] end %% Styling classDef script fill:#f9d77e,stroke:#333,stroke-width:1px; classDef bamlSchema fill:#a1d8b2,stroke:#333,stroke-width:1px; classDef generated fill:#f8b88b,stroke:#333,stroke-width:1px; classDef external fill:#bae1ff,stroke:#333,stroke-width:1px; class A,B,C,D,E,F,J script; class N bamlSchema; class L,M,G,H generated; class I,O external;
This separation helps maintain a clean architecture where:
- BAML files handle the “what” (data structures and LLM interactions)
- Python code handles the “how” (business logic, file handling, etc.)
Get your code ready:
#git checkout -b baml main
#./baml_src/doc_enhancement.baml
#plan_enhancement_baml.py
npm install -g @boundaryml/baml
#npx @boundaryml/baml --version
#0.201.0
#sudo npm install -g @boundaryml/baml@0.201.0
npx @boundaryml/baml generate
All BAML does under the hood is to generate a web request (you will be able to see the raw curl) and configurable via client.baml
There is a baml_client
and you can do:
from baml_client import b
#you can now bring your classes that will check types and so on, just like typescript does, but in python, thanks to BAML (typechecking in prompts)
And there wont be any dependencies on baml code once it has been run and the ./client_baml/*.py
files (or ts, whatever) are generated.
So you will just ship the
baml_client
part!
BAML examples
Semantic streaming feature - A react components that know how to renders itself
There is a LLM Client Call Graph to debug which model is being called
No Internet connection required
Gpt 3.5 + BAML > 4o with structured outputs | Function-calling for every model, in your favorite language
We let the LLM speak the language it thinks its better for the reply and then we take what we want from that reply (that does not have to be a JSON like with structured outputs)
Competitors: PydanticAI, or maybe https://github.com/langchain-ai/langgraph-codeact
langgraph-codeact his library implements the CodeAct architecture in LangGraph. This is the architecture is used by Manus.im.
It implements an alternative to JSON function-calling, which enables solving more complex tasks in less steps.
YOu also have the OSS equivalent: https://github.com/FoundationAgents/OpenManus
This is achieved by making use of the full power of a Turing complete programming language (such as Python used here) to combine and transform the outputs of multiple tools.
baml.com/chat
write me a webscrapper with selenium to extract a products on a site
write the baml code + python code example
BAML as a DSL
They call BAML a DSL (Domain-Specific Language) because it’s precisely that: a programming language tailored specifically to a particular “domain” of problems.
In this case, the domain is building reliable AI workflows and agents, particularly around prompt engineering for Large Language Models (LLMs).
Here’s why that classification fits and what it means:
- Specialized Focus: Unlike a General-Purpose Language (GPL) like Python, JavaScript, or Java, which are designed to solve a wide range of problems across various domains, a DSL like BAML has a very narrow and specific focus. Its syntax, keywords, and constructs are all designed to express concepts directly relevant to interacting with LLMs – defining prompts, specifying input/output schemas, handling model clients, streaming, retries, and so on.
More about BAML as DSL 📌
Higher Level of Abstraction for the Domain: BAML allows you to express your intentions for LLM interactions at a higher level of abstraction than you would in a GPL. Instead of writing boilerplate code to serialize JSON schemas into prompts, handle API calls, and parse messy outputs, BAML provides dedicated syntax for these tasks. For example, the
function ChatAgent(...) -> StopTool | ReplyTool { client "openai/gpt-4o-mini" prompt #"...#" }
syntax is very specific to defining an LLM function, its model, and its prompt structure.Improved Readability and Maintainability within the Domain: Because its syntax is specialized, BAML code becomes more readable and understandable for anyone working within the LLM application development domain. It clearly delineates the structure of your prompts and expected outputs, making it easier to maintain and debug your LLM-driven logic over time compared to managing hundreds of complex f-strings in a general-purpose language.
Generated Code: BAML files are typically compiled or transformed into code in a GPL (like Python, TypeScript, Go, etc.). This means you write your LLM logic in the specialized BAML DSL, and then BAML’s tooling generates the necessary “boilerplate” code in your application’s primary language, which you then integrate into your project. This is a common characteristic of external DSLs.
In essence, BAML is a DSL because it provides a dedicated, purpose-built language to solve a specific problem set (LLM prompt engineering and workflow automation), offering specialized syntax and abstractions that make working within that domain more efficient, reliable, and understandable.
A DSL, like the one Kibana has!
BAML vs Function Calling
This completes our implementation of all three approaches:
- Basic JSON mode (simplest)
- Function calling (schema-defined)
- BAML (type-safe, declarative)
BAML vs Function Caling 📌
Schema Definition (
baml_src/doc_enhancement.baml
):- Defines the data structures and function signatures
- Contains class definitions for inputs and outputs
- Specifies the OpenAI client configuration
- Declares prompts and system messages
Generated Client (
baml_client/
directory):- Auto-generated from BAML definitions
- Contains Pydantic models for type checking
- Provides a strongly-typed client interface
- Generated using
npx @boundaryml/baml generate
command
Python Implementation (
plan_enhancement_baml.py
):- Imports the generated BAML client
- Handles business logic like loading files and parsing arguments
- Makes type-safe API calls using the client
- Processes and formats the results
The type-safe, declarative approach (BAML) offers several significant advantages over just schema-defined approaches (like function calling):
1. Compile-Time Validation vs. Runtime Validation
Function Calling:
- Schema validation happens at runtime
- Errors in schema structure are only discovered when the API is called
- Typos or incorrect field types aren’t caught until execution
BAML:
- Validation happens at compile/generation time
- The code generator catches errors before your application runs
- IDE can provide immediate feedback on type mismatches
2. Language Integration
Function Calling:
- Schema is defined as a JSON structure in your code
- No native language integration with your programming language
- No autocomplete or type hints in your IDE
BAML:
- Generates native language bindings (Pydantic models in Python)
- Full IDE support with autocomplete and type hints
- Seamless integration with the language’s type system
3. Separation of Concerns
Function Calling:
- Schema definition mixed with business logic
- Changes to schema require modifying application code
- Difficult to reuse schemas across different applications
BAML:
- Clear separation between schema and implementation
- Schemas defined in dedicated
.baml
files - Easy to reuse schemas across multiple applications
BAML Is and Is not
It’s important to understand what BAML does and doesn’t do for type safety:
What BAML Does:
Type-safe Input/Output Handling:
- Defines data structures in a declarative language
- Generates Pydantic models for runtime type checking
- Validates inputs before sending to LLM and outputs after receiving responses
API Integration Management:
- Generates all the code needed to connect to the LLM provider (OpenAI in our case)
- Handles authentication, request formatting, response parsing
Prompt Engineering:
- Uses your defined templates from BAML files
- Handles variable interpolation in prompts
Response Format Enforcement:
- Sets the appropriate parameters like
response_format={"type": "json_object"}
- Processes and validates the response through Pydantic models
- Sets the appropriate parameters like
What BAML Doesn’t Do:
Magically Make LLM Outputs Type-Safe:
- BAML doesn’t modify your prompts to ensure type safety
- The LLM could still generate invalid responses
- BAML will catch these invalid responses through Pydantic validation
Replace Good Prompt Engineering:
- You still need to write clear prompts that guide the LLM
- BAML provides the structure, but you provide the guidance
The type safety comes from the combination of:
- The schema you defined (which becomes Pydantic models)
- The response format configuration in the API call
- The runtime validation after the response is received
BAML’s power is that it generates all the infrastructure to properly request, validate, and process responses according to your defined schema, letting you focus on defining your schema and business logic without writing boilerplate code.
Conclusions
You can see which of these cli tools you have installed globally via npm:
npm list -g --depth=0
Context Engineering
Apparently, this is an alternative already to vibe coding.
https://www.youtube.com/watch?v=uohI3h4kqyg
MIT | Context engineering is the new vibe coding - it’s the way to actually make AI coding assistants work. Claude Code is the best for this so that’s what this repo is centered around, but you can apply this strategy with any AI coding assistant!
11Labs x MCP
Python CLi Tools
FAQ
Other TOols
gpl 3.0 | Open Source AI Calling Transcriptions, Summaries, and Analytics built on OpenAI Whisper
Similar to fireb
agpl | No Code Web Data Extraction Platform • Turn Websites To APIs & Spreadsheets In Minutes
MIT | An automated document analyzer for Paperless-ngx using OpenAI API, Ollama, Deepseek-r1, Azure and all OpenAI API compatible Services to automatically analyze and tag your documents.
MIT | LangGraph solution template for MCP