πŸ› οΈ Π€Ρ€Π΅ΠΉΠΌΠ²ΠΎΡ€ΠΊΠΈ ΠΈ инструмСнты

ЭкосистСма LLM

Π’Ρ‹Π±ΠΎΡ€ Ρ„Ρ€Π΅ΠΉΠΌΠ²ΠΎΡ€ΠΊΠ° зависит ΠΎΡ‚ Π·Π°Π΄Π°Ρ‡ΠΈ: быстрый ΠΏΡ€ΠΎΡ‚ΠΎΡ‚ΠΈΠΏ, production RAG, code agent ΠΈΠ»ΠΈ self-hosted inference β€” Ρƒ ΠΊΠ°ΠΆΠ΄ΠΎΠ³ΠΎ сцСнария свой стСк.

🎼 1. Π€Ρ€Π΅ΠΉΠΌΠ²ΠΎΡ€ΠΊΠΈ оркСстрации

LangChain

  • Π§Ρ‚ΠΎ: самый популярный Ρ„Ρ€Π΅ΠΉΠΌΠ²ΠΎΡ€ΠΊ для LLM apps
  • Π‘ΠΈΠ»ΡŒΠ½Ρ‹Π΅ стороны: огромная экосистСма, ΠΈΠ½Ρ‚Π΅Π³Ρ€Π°Ρ†ΠΈΠΈ со всСм
  • Π‘Π»Π°Π±Ρ‹Π΅ стороны: over-abstraction, ΠΌΠ΅Π΄Π»Π΅Π½Π½Ρ‹ΠΉ, слоТный
  • Когда ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ: быстрый ΠΏΡ€ΠΎΡ‚ΠΎΡ‚ΠΈΠΏ, ΠΌΠ½ΠΎΠ³ΠΎ ΠΈΠ½Ρ‚Π΅Π³Ρ€Π°Ρ†ΠΈΠΉ
  • ΠœΠΎΠ΄ΡƒΠ»ΠΈ: LangChain (core), LangGraph (agents), LangSmith (observability)

LlamaIndex

  • Π§Ρ‚ΠΎ: Ρ„Ρ€Π΅ΠΉΠΌΠ²ΠΎΡ€ΠΊ для RAG ΠΈ data-augmented LLM apps
  • Π‘ΠΈΠ»ΡŒΠ½Ρ‹Π΅ стороны: Π»ΡƒΡ‡ΡˆΠΈΠΉ для RAG, data connectors
  • Когда ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ: RAG-прилоТСния, document QA

Haystack (deepset)

  • Π§Ρ‚ΠΎ: production-focused NLP/LLM framework
  • Π‘ΠΈΠ»ΡŒΠ½Ρ‹Π΅ стороны: pipelines, clean architecture, RAG
  • Когда ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ: production RAG, search

πŸ€– 2. Π€Ρ€Π΅ΠΉΠΌΠ²ΠΎΡ€ΠΊΠΈ Π°Π³Π΅Π½Ρ‚ΠΎΠ²

Hermes vs IDE-Π°Π³Π΅Π½Ρ‚Ρ‹

Hermes β€” model-agnostic, MCP-native; Cursor/Claude Code β€” Π·Π°Ρ‚ΠΎΡ‡Π΅Π½Ρ‹ ΠΏΠΎΠ΄ coding Π² IDE. Π’Ρ‹Π±ΠΎΡ€ зависит ΠΎΡ‚ Π·Π°Π΄Π°Ρ‡ΠΈ: ΡƒΠ½ΠΈΠ²Π΅Ρ€ΡΠ°Π»ΡŒΠ½Ρ‹ΠΉ Π°Π³Π΅Π½Ρ‚ vs pair programming.

Hermes Agent (этот ΠΏΡ€ΠΎΠ΄ΡƒΠΊΡ‚)

  • Architecture: model-agnostic, tool-first, MCP-native
  • Memory: MEMORY.md + USER.md + skills/
  • Tools: terminal, browser, file, MCP servers
  • Multi-agent: delegate_task для subagents
  • Context: compressor с protect_last_n

Claude Code

  • Architecture: CLI agent для coding
  • Model: Claude (Sonnet/Opus)
  • Tools: bash, file ops, git, search
  • Extended thinking: reasoning ΠΏΠ΅Ρ€Π΅Π΄ action

OpenAI Codex

  • Architecture: CLI agent, sandboxed execution
  • Model: GPT-5 / o3
  • Tools: terminal, file ops

Cursor / Windsurf / Continue

  • Π§Ρ‚ΠΎ: IDE-integrated AI coding assistants
  • Architecture: inline completion + chat + agent mode
  • Model: любой (Claude, GPT, local)

Aider

  • Π§Ρ‚ΠΎ: CLI pair programmer
  • ΠžΡΠΎΠ±Π΅Π½Π½ΠΎΡΡ‚ΡŒ: git-integrated, ΠΊΠ°ΠΆΠ΄Ρ‹ΠΉ change = commit
  • Tree editing: Ρ€Π΅Π΄Π°ΠΊΡ‚ΠΈΡ€ΡƒΠ΅Ρ‚ нСсколько Ρ„Π°ΠΉΠ»ΠΎΠ² ΠΎΠ΄Π½ΠΎΠ²Ρ€Π΅ΠΌΠ΅Π½Π½ΠΎ

⚑ 3. Inference-двиТки

EngineTypeBest for
vLLMSelf-hosted serverProduction, throughput
TGISelf-hosted serverHuggingFace ecosystem
TensorRT-LLMNVIDIA optimisedMax performance on H100
SGLangSelf-hosted serverStructured generation
llama.cppCPU/edgeLocal, Mac, Raspberry Pi
OllamaDesktop appEasy local LLM
LM StudioDesktop GUINon-technical users

🎯 4. Π˜Π½ΡΡ‚Ρ€ΡƒΠΌΠ΅Π½Ρ‚Ρ‹ Fine-Tuning

LoRA / QLoRA (PEFT)

  • LoRA: ΠΎΠ±ΡƒΡ‡Π°Π΅ΠΌ Ρ‚ΠΎΠ»ΡŒΠΊΠΎ low-rank Π°Π΄Π°ΠΏΡ‚Π΅Ρ€Ρ‹ (0.1-1% ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€ΠΎΠ²)
  • QLoRA: LoRA + 4-bit quantization Π±Π°Π·ΠΎΠ²ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ
  • ΠžΠ±ΡƒΡ‡Π΅Π½ΠΈΠ΅ 70B Π½Π° ΠΎΠ΄Π½ΠΎΠΉ A100 (QLoRA)
  • Π˜Π½ΡΡ‚Ρ€ΡƒΠΌΠ΅Π½Ρ‚Ρ‹: peft, axolotl, unsloth

Full Fine-Tuning

  • ΠžΠ±ΡƒΡ‡Π°Π΅ΠΌ всС ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€Ρ‹
  • НуТСн кластСр GPU
  • Π˜Π½ΡΡ‚Ρ€ΡƒΠΌΠ΅Π½Ρ‚Ρ‹: transformers, DeepSpeed, Megatron-LM

DPO / RLHF Training

  • TRL (Transformers RL): HuggingFace library для preference optimization
  • Unsloth: ΠΎΠΏΡ‚ΠΈΠΌΠΈΠ·ΠΈΡ€ΠΎΠ²Π°Π½Π½Ρ‹ΠΉ fine-tuning (2-5Γ— быстрСС)
  • Axolotl: config-driven fine-tuning

πŸ—„οΈ 5. Π˜Π½ΡΡ‚Ρ€ΡƒΠΌΠ΅Π½Ρ‚Ρ‹ Π΄Π°Π½Π½Ρ‹Ρ…

Π’Π΅ΠΊΡ‚ΠΎΡ€Π½Ρ‹Π΅ Ρ…Ρ€Π°Π½ΠΈΠ»ΠΈΡ‰Π° β€” ΠΏΠΎΠ΄Ρ€ΠΎΠ±Π½Π΅Π΅

Qdrant:

from qdrant_client import QdrantClient
 
client = QdrantClient(host="localhost", port=6333)
client.create_collection(
    "documents",
    vectors_config={"size": 1536, "distance": "Cosine"}
)
client.upsert("documents", points=[
    {"id": 1, "vector": [0.1, ...], "payload": {"text": "..."}}
])
results = client.search("documents", query_vector=[0.1, ...], limit=5)

pgvector (PostgreSQL):

CREATE EXTENSION vector;
CREATE TABLE docs (id SERIAL, content TEXT, embedding VECTOR(1536));
CREATE INDEX ON docs USING ivfflat (embedding vector_cosine_ops);
SELECT * FROM docs ORDER BY embedding <=> '[0.1,...]' LIMIT 5;

Π Π°Π·ΠΌΠ΅Ρ‚ΠΊΠ° / аннотация Π΄Π°Π½Π½Ρ‹Ρ…

  • Label Studio: open-source, ΡƒΠ½ΠΈΠ²Π΅Ρ€ΡΠ°Π»ΡŒΠ½Ρ‹ΠΉ
  • Argilla: specialized для LLM fine-tuning data
  • Prodigy: paid, efficient

πŸ‘οΈ 6. ΠŸΠ»Π°Ρ‚Ρ„ΠΎΡ€ΠΌΡ‹ observability

LangSmith (LangChain)

import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "ls__..."
# ВсС LLM calls автоматичСски трСйсятся

Langfuse (open-source)

  • Self-hosted ΠΈΠ»ΠΈ cloud
  • Tracing для любого Ρ„Ρ€Π΅ΠΉΠΌΠ²ΠΎΡ€ΠΊΠ° (Π½Π΅ Ρ‚ΠΎΠ»ΡŒΠΊΠΎ LangChain)
  • Prompt management, A/B testing, eval

Phoenix (Arize)

  • Open-source LLM observability
  • Tracing + evaluation
  • Integration с OpenTelemetry

πŸ“¦ 7. Model Hubs

HubΠ§Ρ‚ΠΎΠžΡΠΎΠ±Π΅Π½Π½ΠΎΡΡ‚ΡŒ
HuggingFace Hub1M+ models, datasetsΠ‘Ρ‚Π°Π½Π΄Π°Ρ€Ρ‚ для open-source
Ollama HubPre-quantized modelsДля локального запуска
ModelScopeAlibabaΠšΠΈΡ‚Π°ΠΉΡΠΊΠΈΠ΅ ΠΌΠΎΠ΄Π΅Π»ΠΈ
OpenRouterAPI aggregator200+ ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ Ρ‡Π΅Ρ€Π΅Π· ΠΎΠ΄ΠΈΠ½ API

πŸ”§ 8. Dev Tools

Π£ΠΏΡ€Π°Π²Π»Π΅Π½ΠΈΠ΅ ΠΏΡ€ΠΎΠΌΠΏΡ‚Π°ΠΌΠΈ

  • PromptLayer: versioning, A/B testing prompts
  • Langfuse: prompt management + observability
  • Humanloop: collaborative prompt engineering

ВСстированиС

  • Promptfoo: CLI для testing prompts
  • DeepEval: pytest для LLM outputs
  • RAGAS: RAG evaluation framework

Sandboxing

  • E2B: cloud sandboxes для code execution
  • Daytona: open-source sandbox
  • Docker: standard containerization

πŸ”„ 9. Workflow Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚ΠΊΠΈ (Π»ΡƒΡ‡ΡˆΠΈΠ΅ ΠΏΡ€Π°ΠΊΡ‚ΠΈΠΊΠΈ)

Π›ΠΎΠΊΠ°Π»ΡŒΠ½Π°Ρ Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚ΠΊΠ°

1. Ollama ΠΈΠ»ΠΈ LM Studio β€” локальная модСль для iteration
2. Promptfoo β€” тСстированиС ΠΏΡ€ΠΎΠΌΠΏΡ‚ΠΎΠ²
3. Langfuse β€” Π»ΠΎΠΊΠ°Π»ΡŒΠ½Ρ‹ΠΉ tracing
4. Qdrant (Docker) β€” Π»ΠΎΠΊΠ°Π»ΡŒΠ½Ρ‹ΠΉ vector store

Staging

1. vLLM on GPU instance β€” production-like model serving
2. Golden dataset eval β€” regression testing
3. A/B testing framework
4. LangSmith / Langfuse cloud β€” observability

Production

1. Kubernetes β€” orchestration
2. vLLM cluster β€” serving
3. Redis β€” caching
4. Qdrant β€” vector store
5. PostgreSQL β€” metadata
6. Prometheus + Grafana β€” metrics
7. Langfuse β€” LLM-specific tracing

πŸ“‹ 10. Π¨ΠΏΠ°Ρ€Π³Π°Π»ΠΊΠ°: Ρ‡Ρ‚ΠΎ Π²Ρ‹Π±Ρ€Π°Ρ‚ΡŒ

НС пСрСуслоТняйтС стСк

НачнитС с минимального Π½Π°Π±ΠΎΡ€Π° β€” LangChain + OpenAI API для ΠΏΡ€ΠΎΡ‚ΠΎΡ‚ΠΈΠΏΠ°, добавляйтС Qdrant, vLLM ΠΈ observability ΠΏΠΎ ΠΌΠ΅Ρ€Π΅ роста Π½Π°Π³Ρ€ΡƒΠ·ΠΊΠΈ.

ΠŸΡ€ΠΎΡ‚ΠΎΡ‚ΠΈΠΏ LLM app быстро β†’ LangChain + OpenAI API
RAG ΠΏΡ€ΠΈΠ»ΠΎΠΆΠ΅Π½ΠΈΠ΅ β†’ LlamaIndex + Qdrant + BGE embeddings
Production agent β†’ Hermes / LangGraph + custom tools
Self-hosted model β†’ vLLM on GPU
Fine-tune open model β†’ Unsloth + QLoRA
Local development β†’ Ollama + Promptfoo
Code agent β†’ Claude Code / Aider / OpenCode
Eval pipeline β†’ RAGAS + Langfuse
Monitoring β†’ LangSmith (cloud) / Langfuse (self-hosted)