π οΈ Π€ΡΠ΅ΠΉΠΌΠ²ΠΎΡΠΊΠΈ ΠΈ ΠΈΠ½ΡΡΡΡΠΌΠ΅Π½ΡΡ
ΠΠΊΠΎΡΠΈΡΡΠ΅ΠΌΠ° LLM
ΠΡΠ±ΠΎΡ ΡΡΠ΅ΠΉΠΌΠ²ΠΎΡΠΊΠ° Π·Π°Π²ΠΈΡΠΈΡ ΠΎΡ Π·Π°Π΄Π°ΡΠΈ: Π±ΡΡΡΡΡΠΉ ΠΏΡΠΎΡΠΎΡΠΈΠΏ, production RAG, code agent ΠΈΠ»ΠΈ self-hosted inference β Ρ ΠΊΠ°ΠΆΠ΄ΠΎΠ³ΠΎ ΡΡΠ΅Π½Π°ΡΠΈΡ ΡΠ²ΠΎΠΉ ΡΡΠ΅ΠΊ.
πΌ 1. Π€ΡΠ΅ΠΉΠΌΠ²ΠΎΡΠΊΠΈ ΠΎΡΠΊΠ΅ΡΡΡΠ°ΡΠΈΠΈ
LangChain
- Π§ΡΠΎ: ΡΠ°ΠΌΡΠΉ ΠΏΠΎΠΏΡΠ»ΡΡΠ½ΡΠΉ ΡΡΠ΅ΠΉΠΌΠ²ΠΎΡΠΊ Π΄Π»Ρ LLM apps
- Π‘ΠΈΠ»ΡΠ½ΡΠ΅ ΡΡΠΎΡΠΎΠ½Ρ: ΠΎΠ³ΡΠΎΠΌΠ½Π°Ρ ΡΠΊΠΎΡΠΈΡΡΠ΅ΠΌΠ°, ΠΈΠ½ΡΠ΅Π³ΡΠ°ΡΠΈΠΈ ΡΠΎ Π²ΡΠ΅ΠΌ
- Π‘Π»Π°Π±ΡΠ΅ ΡΡΠΎΡΠΎΠ½Ρ: over-abstraction, ΠΌΠ΅Π΄Π»Π΅Π½Π½ΡΠΉ, ΡΠ»ΠΎΠΆΠ½ΡΠΉ
- ΠΠΎΠ³Π΄Π° ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°ΡΡ: Π±ΡΡΡΡΡΠΉ ΠΏΡΠΎΡΠΎΡΠΈΠΏ, ΠΌΠ½ΠΎΠ³ΠΎ ΠΈΠ½ΡΠ΅Π³ΡΠ°ΡΠΈΠΉ
- ΠΠΎΠ΄ΡΠ»ΠΈ: LangChain (core), LangGraph (agents), LangSmith (observability)
LlamaIndex
- Π§ΡΠΎ: ΡΡΠ΅ΠΉΠΌΠ²ΠΎΡΠΊ Π΄Π»Ρ RAG ΠΈ data-augmented LLM apps
- Π‘ΠΈΠ»ΡΠ½ΡΠ΅ ΡΡΠΎΡΠΎΠ½Ρ: Π»ΡΡΡΠΈΠΉ Π΄Π»Ρ RAG, data connectors
- ΠΠΎΠ³Π΄Π° ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°ΡΡ: RAG-ΠΏΡΠΈΠ»ΠΎΠΆΠ΅Π½ΠΈΡ, document QA
Haystack (deepset)
- Π§ΡΠΎ: production-focused NLP/LLM framework
- Π‘ΠΈΠ»ΡΠ½ΡΠ΅ ΡΡΠΎΡΠΎΠ½Ρ: pipelines, clean architecture, RAG
- ΠΠΎΠ³Π΄Π° ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°ΡΡ: production RAG, search
π€ 2. Π€ΡΠ΅ΠΉΠΌΠ²ΠΎΡΠΊΠΈ Π°Π³Π΅Π½ΡΠΎΠ²
Hermes vs IDE-Π°Π³Π΅Π½ΡΡ
Hermes β model-agnostic, MCP-native; Cursor/Claude Code β Π·Π°ΡΠΎΡΠ΅Π½Ρ ΠΏΠΎΠ΄ coding Π² IDE. ΠΡΠ±ΠΎΡ Π·Π°Π²ΠΈΡΠΈΡ ΠΎΡ Π·Π°Π΄Π°ΡΠΈ: ΡΠ½ΠΈΠ²Π΅ΡΡΠ°Π»ΡΠ½ΡΠΉ Π°Π³Π΅Π½Ρ vs pair programming.
Hermes Agent (ΡΡΠΎΡ ΠΏΡΠΎΠ΄ΡΠΊΡ)
- Architecture: model-agnostic, tool-first, MCP-native
- Memory: MEMORY.md + USER.md + skills/
- Tools: terminal, browser, file, MCP servers
- Multi-agent: delegate_task Π΄Π»Ρ subagents
- Context: compressor Ρ protect_last_n
Claude Code
- Architecture: CLI agent Π΄Π»Ρ coding
- Model: Claude (Sonnet/Opus)
- Tools: bash, file ops, git, search
- Extended thinking: reasoning ΠΏΠ΅ΡΠ΅Π΄ action
OpenAI Codex
- Architecture: CLI agent, sandboxed execution
- Model: GPT-5 / o3
- Tools: terminal, file ops
Cursor / Windsurf / Continue
- Π§ΡΠΎ: IDE-integrated AI coding assistants
- Architecture: inline completion + chat + agent mode
- Model: Π»ΡΠ±ΠΎΠΉ (Claude, GPT, local)
Aider
- Π§ΡΠΎ: CLI pair programmer
- ΠΡΠΎΠ±Π΅Π½Π½ΠΎΡΡΡ: git-integrated, ΠΊΠ°ΠΆΠ΄ΡΠΉ change = commit
- Tree editing: ΡΠ΅Π΄Π°ΠΊΡΠΈΡΡΠ΅Ρ Π½Π΅ΡΠΊΠΎΠ»ΡΠΊΠΎ ΡΠ°ΠΉΠ»ΠΎΠ² ΠΎΠ΄Π½ΠΎΠ²ΡΠ΅ΠΌΠ΅Π½Π½ΠΎ
β‘ 3. Inference-Π΄Π²ΠΈΠΆΠΊΠΈ
| Engine | Type | Best for |
|---|---|---|
| vLLM | Self-hosted server | Production, throughput |
| TGI | Self-hosted server | HuggingFace ecosystem |
| TensorRT-LLM | NVIDIA optimised | Max performance on H100 |
| SGLang | Self-hosted server | Structured generation |
| llama.cpp | CPU/edge | Local, Mac, Raspberry Pi |
| Ollama | Desktop app | Easy local LLM |
| LM Studio | Desktop GUI | Non-technical users |
π― 4. ΠΠ½ΡΡΡΡΠΌΠ΅Π½ΡΡ Fine-Tuning
LoRA / QLoRA (PEFT)
- LoRA: ΠΎΠ±ΡΡΠ°Π΅ΠΌ ΡΠΎΠ»ΡΠΊΠΎ low-rank Π°Π΄Π°ΠΏΡΠ΅ΡΡ (0.1-1% ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡΠΎΠ²)
- QLoRA: LoRA + 4-bit quantization Π±Π°Π·ΠΎΠ²ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ
- ΠΠ±ΡΡΠ΅Π½ΠΈΠ΅ 70B Π½Π° ΠΎΠ΄Π½ΠΎΠΉ A100 (QLoRA)
- ΠΠ½ΡΡΡΡΠΌΠ΅Π½ΡΡ:
peft,axolotl,unsloth
Full Fine-Tuning
- ΠΠ±ΡΡΠ°Π΅ΠΌ Π²ΡΠ΅ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡΡ
- ΠΡΠΆΠ΅Π½ ΠΊΠ»Π°ΡΡΠ΅Ρ GPU
- ΠΠ½ΡΡΡΡΠΌΠ΅Π½ΡΡ:
transformers,DeepSpeed,Megatron-LM
DPO / RLHF Training
- TRL (Transformers RL): HuggingFace library Π΄Π»Ρ preference optimization
- Unsloth: ΠΎΠΏΡΠΈΠΌΠΈΠ·ΠΈΡΠΎΠ²Π°Π½Π½ΡΠΉ fine-tuning (2-5Γ Π±ΡΡΡΡΠ΅Π΅)
- Axolotl: config-driven fine-tuning
ποΈ 5. ΠΠ½ΡΡΡΡΠΌΠ΅Π½ΡΡ Π΄Π°Π½Π½ΡΡ
ΠΠ΅ΠΊΡΠΎΡΠ½ΡΠ΅ Ρ ΡΠ°Π½ΠΈΠ»ΠΈΡΠ° β ΠΏΠΎΠ΄ΡΠΎΠ±Π½Π΅Π΅
Qdrant:
from qdrant_client import QdrantClient
client = QdrantClient(host="localhost", port=6333)
client.create_collection(
"documents",
vectors_config={"size": 1536, "distance": "Cosine"}
)
client.upsert("documents", points=[
{"id": 1, "vector": [0.1, ...], "payload": {"text": "..."}}
])
results = client.search("documents", query_vector=[0.1, ...], limit=5)pgvector (PostgreSQL):
CREATE EXTENSION vector;
CREATE TABLE docs (id SERIAL, content TEXT, embedding VECTOR(1536));
CREATE INDEX ON docs USING ivfflat (embedding vector_cosine_ops);
SELECT * FROM docs ORDER BY embedding <=> '[0.1,...]' LIMIT 5;Π Π°Π·ΠΌΠ΅ΡΠΊΠ° / Π°Π½Π½ΠΎΡΠ°ΡΠΈΡ Π΄Π°Π½Π½ΡΡ
- Label Studio: open-source, ΡΠ½ΠΈΠ²Π΅ΡΡΠ°Π»ΡΠ½ΡΠΉ
- Argilla: specialized Π΄Π»Ρ LLM fine-tuning data
- Prodigy: paid, efficient
ποΈ 6. ΠΠ»Π°ΡΡΠΎΡΠΌΡ observability
LangSmith (LangChain)
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "ls__..."
# ΠΡΠ΅ LLM calls Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΈ ΡΡΠ΅ΠΉΡΡΡΡΡLangfuse (open-source)
- Self-hosted ΠΈΠ»ΠΈ cloud
- Tracing Π΄Π»Ρ Π»ΡΠ±ΠΎΠ³ΠΎ ΡΡΠ΅ΠΉΠΌΠ²ΠΎΡΠΊΠ° (Π½Π΅ ΡΠΎΠ»ΡΠΊΠΎ LangChain)
- Prompt management, A/B testing, eval
Phoenix (Arize)
- Open-source LLM observability
- Tracing + evaluation
- Integration Ρ OpenTelemetry
π¦ 7. Model Hubs
| Hub | Π§ΡΠΎ | ΠΡΠΎΠ±Π΅Π½Π½ΠΎΡΡΡ |
|---|---|---|
| HuggingFace Hub | 1M+ models, datasets | Π‘ΡΠ°Π½Π΄Π°ΡΡ Π΄Π»Ρ open-source |
| Ollama Hub | Pre-quantized models | ΠΠ»Ρ Π»ΠΎΠΊΠ°Π»ΡΠ½ΠΎΠ³ΠΎ Π·Π°ΠΏΡΡΠΊΠ° |
| ModelScope | Alibaba | ΠΠΈΡΠ°ΠΉΡΠΊΠΈΠ΅ ΠΌΠΎΠ΄Π΅Π»ΠΈ |
| OpenRouter | API aggregator | 200+ ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ ΡΠ΅ΡΠ΅Π· ΠΎΠ΄ΠΈΠ½ API |
π§ 8. Dev Tools
Π£ΠΏΡΠ°Π²Π»Π΅Π½ΠΈΠ΅ ΠΏΡΠΎΠΌΠΏΡΠ°ΠΌΠΈ
- PromptLayer: versioning, A/B testing prompts
- Langfuse: prompt management + observability
- Humanloop: collaborative prompt engineering
Π’Π΅ΡΡΠΈΡΠΎΠ²Π°Π½ΠΈΠ΅
- Promptfoo: CLI Π΄Π»Ρ testing prompts
- DeepEval: pytest Π΄Π»Ρ LLM outputs
- RAGAS: RAG evaluation framework
Sandboxing
- E2B: cloud sandboxes Π΄Π»Ρ code execution
- Daytona: open-source sandbox
- Docker: standard containerization
π 9. Workflow ΡΠ°Π·ΡΠ°Π±ΠΎΡΠΊΠΈ (Π»ΡΡΡΠΈΠ΅ ΠΏΡΠ°ΠΊΡΠΈΠΊΠΈ)
ΠΠΎΠΊΠ°Π»ΡΠ½Π°Ρ ΡΠ°Π·ΡΠ°Π±ΠΎΡΠΊΠ°
1. Ollama ΠΈΠ»ΠΈ LM Studio β Π»ΠΎΠΊΠ°Π»ΡΠ½Π°Ρ ΠΌΠΎΠ΄Π΅Π»Ρ Π΄Π»Ρ iteration
2. Promptfoo β ΡΠ΅ΡΡΠΈΡΠΎΠ²Π°Π½ΠΈΠ΅ ΠΏΡΠΎΠΌΠΏΡΠΎΠ²
3. Langfuse β Π»ΠΎΠΊΠ°Π»ΡΠ½ΡΠΉ tracing
4. Qdrant (Docker) β Π»ΠΎΠΊΠ°Π»ΡΠ½ΡΠΉ vector store
Staging
1. vLLM on GPU instance β production-like model serving
2. Golden dataset eval β regression testing
3. A/B testing framework
4. LangSmith / Langfuse cloud β observability
Production
1. Kubernetes β orchestration
2. vLLM cluster β serving
3. Redis β caching
4. Qdrant β vector store
5. PostgreSQL β metadata
6. Prometheus + Grafana β metrics
7. Langfuse β LLM-specific tracing
π 10. Π¨ΠΏΠ°ΡΠ³Π°Π»ΠΊΠ°: ΡΡΠΎ Π²ΡΠ±ΡΠ°ΡΡ
ΠΠ΅ ΠΏΠ΅ΡΠ΅ΡΡΠ»ΠΎΠΆΠ½ΡΠΉΡΠ΅ ΡΡΠ΅ΠΊ
ΠΠ°ΡΠ½ΠΈΡΠ΅ Ρ ΠΌΠΈΠ½ΠΈΠΌΠ°Π»ΡΠ½ΠΎΠ³ΠΎ Π½Π°Π±ΠΎΡΠ° β LangChain + OpenAI API Π΄Π»Ρ ΠΏΡΠΎΡΠΎΡΠΈΠΏΠ°, Π΄ΠΎΠ±Π°Π²Π»ΡΠΉΡΠ΅ Qdrant, vLLM ΠΈ observability ΠΏΠΎ ΠΌΠ΅ΡΠ΅ ΡΠΎΡΡΠ° Π½Π°Π³ΡΡΠ·ΠΊΠΈ.
ΠΡΠΎΡΠΎΡΠΈΠΏ LLM app Π±ΡΡΡΡΠΎ β LangChain + OpenAI API
RAG ΠΏΡΠΈΠ»ΠΎΠΆΠ΅Π½ΠΈΠ΅ β LlamaIndex + Qdrant + BGE embeddings
Production agent β Hermes / LangGraph + custom tools
Self-hosted model β vLLM on GPU
Fine-tune open model β Unsloth + QLoRA
Local development β Ollama + Promptfoo
Code agent β Claude Code / Aider / OpenCode
Eval pipeline β RAGAS + Langfuse
Monitoring β LangSmith (cloud) / Langfuse (self-hosted)