ANI

Ukwakha Izinhlelo Zendawo ze-AI: Qwen3.6 + MCPs

0 5 16 minutes read

Ukwakha Izinhlelo Zendawo ze-AI: Qwen3.6 + MCPs

# Sethula i-MCP

Sonke isakhiwo sikanjiniyela esine-AI yasendaweni sifinyelela odongeni olufanayo ekugcineni. Imodeli iyasebenza. Icabanga kahle, ibhala ikhodi eqinile, futhi iphendula imibuzo enzima. Kodwa ayikwazi ukwenza konke. Ayikwazi ukubuza isizindalwazi sakho, ivule inkinga ye-GitHub, noma ishayele i-API yakho yangaphakathi. Usale ubhala ama-wrapper e-Python enziwe ngokwezifiso kuwo wonke amathuluzi owadingayo, ukufaka ikhodi ngokuqinile ingcina phakathi kokuphuma kwemodeli nokwenziwa kwethuluzi, nokugcina lawo ma-wrapper njalo lapho i-API ishintsha.

I I-Model Context Protocol (MCP) yakhelwe ukuxazulula lokhu ngqo. Kuyindinganiso evulekile ye-Anthropic: iphrothokholi yendawo yonke, exhunywayo yokuxhuma ithuluzi le-AI. Chaza ithuluzi kanye njengeseva ye-MCP. Noma yiliphi iklayenti elisebenzisana ne-MCP, noma iyiphi imodeli, noma yiluphi uhlaka, lingathola futhi liyibize ngekhodi yokuhlanganisa ngokwezifiso eyiziro imodeli ngayinye.

Qwen3.6-35B-A3B iyimodeli yendawo enekhono kakhulu yalolu hlobo lomsebenzi okwamanje. Inefasitela lomongo wamathokheni angu-262,144, i-Mixture of Experts (MoE) yokwakheka eyenza isebenze i-3B kuphela yemingcele yayo engu-35B ngokudlula okuya phambili (yingakho ingena ku-hardware okungafanele ikwazi ukusebenzisa imodeli engu-35B), futhi yaqeqeshwa ngokusobala futhi yahlolwa ku-agentic esekelwe ku-MCP.

Lesi sihloko sakha umsizi kanjiniyela we-GitHub wendawo: i-ejenti efunda izindaba ezivulekile zenqolobane, isesha ikhodi efanelekile, ibhale ukulungisa, futhi idale isicelo sokudonsa. Yonke le nto isebenza ku-hardware yakho, ngamaseva e-MCP, ngaphandle kokuncika kwamafu.

# Ukuqonda Qwen3.6-35B-A3B

Ukuqonda i-architecture kubalulekile lapha ngoba kuchaza ngokuqondile ukuthi iyiphi i-hardware oyidingayo nokuthi kungani imodeli yenza ngendlela eyenza ngayo emisebenzini ye-ejenti.

Igama lifaka ikhodi yeqiniso elingukhiye: 35B inani lamapharamitha, A3B okusho ukuthi 3B yenziwe yasebenza ngokudlula okuya phambili. Kuyimodeli ye-MoE enochwepheshe abangu-256 isendlalelo ngasinye, ihambisa ochwepheshe abahlanganyelwe abangu-8 kanye no-1 ngethokheni ngayinye. Uthola umthamo wolwazi wemodeli engu-35B ngenani elilinganiselwe lokubala lemodeli engu-3B. Lokho kuhwebelana yingakho kufanelana ne-hardware engadilika ngaphansi kwe-35B eminyene.

Isakhiwo esifihliwe yilapho i-Qwen3.6 ihlukana kakhulu kwamanye amamodeli e-MoE. Ibhulokhi ngayinye kusitaki sezendlalelo ezingu-40 ilandela isilinganiso esingu-3:1 sezendlalelo ze-Gated DeltaNet kuya kuzendlalelo zokunakwa kweGated. I-DeltaNet iyindlela yokunaka eqondile; icubungula ukulandelana ngokuyimpumelelo kunokunaka okugcwele kwamaquadratic, ikakhulukazi ngobude bomongo omude. Izendlalelo ze-Gated Attention ezigcwele ezihlukanisiwe zinikeza ukucabanga okujulile okuhlobene okugeja ukunaka okukodwa. Kumenzeli osebenza ngekhosombe lamafayela angu-500, leyo nhlanganisela ibalulekile: ukucubungula okusebenzayo ubude okuhlanganiswe nokucabanga okunembayo ezigabeni ezifanele.

Iwindi lomongo lingamathokheni angu-262,144 ngokwemvelo, anwebeka ku-1,010,000 ngokukala kwe-YaRN. Ngomsebenzi womenzeli, ubude bomongo abusona isici sokunethezeka; kuyisithiyo sokusebenza. Umenzeli ofunda amafayela omthombo, ukugcina umlando wekholi wamathuluzi, ukulandelela uhlelo lwezinyathelo eziningi, kanye nemiphumela yethuluzi lokujova emuva kokuqukethwe kudinga inkomba yangempela. Amamodeli amaningi we-7B kanye ne-13B amathokheni angu-8k noma angu-32k. Ukuphelelwa umsebenzi wamaphakathi nomsebenzi kusho ukuthi i-ejenti ilahlekelwa umlando wayo futhi iqala imiphumela yethuluzi elikhohlisayo.

I-Qwen3.6 yaqeqeshwa ngokusobala futhi yahlolwa ngamabhentshimakhi e-MCP asekelwe ku-MCP. Izici ezimbili eziyinhloko eziphume kulokho kuqeqeshwa:

Ikhodi ye-Agentic. Ukugeleza komsebenzi okungaphambili kanye nokucabanga kweleveli yenqolobane – imodeli iphatha imisebenzi yokuhlanganisa amafayela amaningi ngokucabanga okuhambisanayo kuwo wonke amafayela, hhayi nje ukuhlela ifayela elilodwa ngokuhlukana.
Ukulondolozwa kokucabanga. A preserve_thinking ifulegi eligcina ukulandelana kokucabanga kusuka ekuphendukeni kwangaphambilini engxoxweni enamathuba amaningi. Uma i-ejenti icabanga ngohlelo ilandelana eyodwa bese yenza izingcingo zamathuluzi ngokushintshana kwesibili kuya kwesihlanu, preserve_thinking=True igcina ukucabanga kwe-turn-one kutholakala kunqolobane ye-KV. Ithuba ngalinye elilandelayo lizuza kulokho kucabanga kwangaphambilini ngaphandle kokukhokha izindleko zokuyiphinda.

# Izidingo Zesistimu

Kunezindlela ezintathu zokusebenzisa ezingokoqobo, futhi ukuthi iyiphi oyisebenzisayo incike ngokuphelele kuhadiwe yakho.

Ukuqondiswa kwe-GPU (kunconyelwe umthwalo wokusebenza we-ejenti yokukhiqiza). I-Qwen3.6-35B-A3B ku-bfloat16 idinga cishe u-70 GB VRAM. Ku-Q4 quantization, ilingana cishe ku-20-24 GB. I-RTX 4090 eyodwa (24 GB) iphethe i-Q4. Ama-RTX 3090 amabili ane-tensor parallelism aphatha i-Q4 futhi. I-A100 80 GB iphethe imodeli egcwele ye-bfloat16.
CPU/Hybrid nge-KTransformers. I-KTransformers iyindlela efinyelelekayo yonjiniyela ngaphandle kwe-24 GB GPU. Ilayisha izendlalelo zekhompuyutha ezinzima ku-GPU uma itholakala futhi isebenzise ezinye ku-CPU. Nge-RAM yesistimu engu-64 GB, ungasebenzisa i-Qwen3.6-35B-A3B ekucushweni okusebenzisekayo (uma kuhamba kancane). Ukubambezeleka kwempendulo kuzoba amasekhondi angu-30–120 ijika ngalinye kuye nge-CPU yakho, esebenza kumenzeli owenza uhlaziyo lwenqolobane yangemuva kodwa hhayi ngezikhathi zokubhala amakhodi.
Amamodeli amancane okuhlola okokufundisa. Yonke iphethini yokuhlanganisa ye-MCP kulesi sihloko iyafana ngokunganaki usayizi wemodeli. Uma ufuna ukulandela ngaphandle kwehadiwe yemodeli egcwele ye-35B, sebenzisa Qwen/Qwen2.5-7B-Instruct ngo-Ollama (ollama pull qwen2.5:7b) noma imodeli ye-Qwen3-8B. I-API yokukhonza iyafana, ikhodi iyafana, futhi ungashintshanisa ngemodeli engu-35B lapho ihadiwe ivuma.

Izidingo zesoftware:

# Python 3.11+ required
python --version

python -m venv qwen-mcp-env
source qwen-mcp-env/bin/activate    # macOS / Linux
qwen-mcp-envScriptsactivate       # Windows

# Core packages
pip install 
  "openai>=1.30.0" 
  "qwen-agent>=0.0.10" 
  "mcp>=1.0.0" 
  "httpx>=0.27.0"

# Serving framework -- choose one
pip install "vllm>=0.19.0"       # NVIDIA GPU
pip install "sglang>=0.5.10"     # NVIDIA GPU (faster prefill for long context)
pip install "ktransformers"      # CPU/hybrid

# Node.js 18+ is required for pre-built MCP servers installed via npx
node --version

# Ukukhonza i-Qwen3.6 endaweni nge-OpenAI-Compatible API

Ngaphambi kokufaka izintambo kunoma yimaphi amaseva we-MCP, udinga iseva esebenzayo ye-inference. Kokubili i-SGlang ne-vLLM zidalula i-OpenAI-compatible API isendlalelo sokuhlanganisa se-MCP esikhuluma nayo – indawo efanayo ye-API, esanda kukhomba ku-localhost esikhundleni se-api.openai.com.

// I-SGLang (Inconyelwe Imithwalo Yomsebenzi Yomenzeli Wokuqukethwe Okude)

# Install SGLang with full dependencies
pip install "sglang[all]>=0.5.10"

# Serve Qwen3.6-35B-A3B with reasoning and tool-call parsers enabled.
# --reasoning-parser qwen3 correctly handles the ... blocks.
# --tool-call-parser qwen3_coder routes tool call outputs to the right format.
# --enable-prefix-caching is critical for agent workloads -- enables KV cache reuse
#   across turns, which is what makes preserve_thinking efficient in practice.

python -m sglang.launch_server 
    --model-path Qwen/Qwen3.6-35B-A3B 
    --host 0.0.0.0 
    --port 30000 
    --reasoning-parser qwen3 
    --tool-call-parser qwen3_coder 
    --enable-prefix-caching 
    --tp 2    # tensor parallel across 2 GPUs; remove if using single GPU

// i-vLLM

pip install "vllm>=0.19.0"

# vLLM equivalent with the same critical flags
vllm serve Qwen/Qwen3.6-35B-A3B 
    --host 0.0.0.0 
    --port 8000 
    --reasoning-parser qwen3 
    --tool-call-parser qwen3_coder 
    --enable-prefix-caching-v2 
    --tensor-parallel-size 2

// Imodeli Encane nge-Ollama

ollama pull qwen2.5:7b
ollama serve
# Ollama's API is OpenAI-compatible at

Uma iseva isebenza, iqinisekise ngaphambi kokuqhubeka:

# Health check -- should return {"status": "ok"} or similar
curl 

# Test the chat completions endpoint with a simple query
curl  
  -H "Content-Type: application/json" 
  -d '{
    "model": "Qwen/Qwen3.6-35B-A3B",
    "messages": [{"role": "user", "content": "Reply with: ready"}],
    "max_tokens": 10
  }'

Uma uthola impendulo ye-JSON nge-a choices uhlu, iseva isilungile. Ungaqhubeki ekusetheni i-MCP kuze kusebenze lokhu. Konke ukwehluleka kokuhlanganiswa ozohlangana nakho kamuva kulula ukukulungisa lapho wazi ukuthi isendlalelo sokuphakela siqinile.

# Ukuqonda i-MCP nokuthi Kungani Ishintsha Isakhiwo Somenzeli

Ngaphambi kokubhala noma iyiphi ikhodi ye-ejenti, kuyasiza ukuqonda ukuthi i-MCP yenzani ngempela ezingeni lephrothokholi, ngoba lokho kuqonda kuvimbela isigaba seziphazamisi ezivela ekuphatheni i-MCP njenge-API yokubiza umsebenzi.

I-MCP iyiphrothokholi ye-JSON-RPC 2.0 egijima phezu kwezokuthutha ze-stdio noma ze-HTTP. Uma iklayenti le-MCP lixhuma kuseva, into yokuqala eliyenzayo ukushayela tools/list ukuthola ukuthi yimaphi amathuluzi iseva edalula. Ithuluzi ngalinye libuya negama, incazelo, kanye ne-schema sokufaka esichazwe ku-JSON Schema. Imodeli ifunda lesi schema. Kuyinkontileka yemodeli nethuluzi.

Uma imodeli ifuna ukubiza ithuluzi, ikhipha into ehlelekile yokubiza ithuluzi. Iklayenti le-MCP – hhayi imodeli – empeleni lenza ucingo ngokuthumela a tools/call isicelo kuseva. Iseva iphatha ukwenza futhi ibuyisela umphumela. Iklayenti lijova lowo mphumela engxoxweni njenge- tool umlayezo wendima. Imodeli ifunda umphumela bese inquma isinyathelo esilandelayo.

Lokhu kuhlukana kubalulekile. Imodeli inquma ukuthi izobiza ini nokuthi yiziphi izingxabano. Iklayenti liphethe ukubulawa. Iseva iphethe umsebenzi wangempela. Ikhodi yakho ayilokothi isebenzise ithuluzi kumodeli; uvele utshele iklayenti ukuthi yimaphi amaseva atholakalayo.

Kunezindlela ezimbili zokusebenzisa i-MCP nge-Qwen3.6:

Nge-Qwen-Agent: isikhulu qwen_agent umtapo wolwazi uphatha ukutholwa kwamathuluzi, ukuhlukaniswa kwekholi, umjovo wemiphumela, nokuphathwa kwengxoxo eshintshashintshashintshayo ngokuzenzakalelayo. Ikhodi encane, ukulawula okuncane. Okulungile ezimweni eziningi zokusetshenziswa.
Nge-MCP Python SDK ngqo: uphatha iluphu ye-ajenti ngokwakho usebenzisa mcp.ClientSession. Ikhodi eyengeziwe, ukubonakala okugcwele kuwo wonke umlayezo, ukulawula okuphelele ekuphathweni kwephutha bese uzama futhi ukuqonda. Okulungile kumasistimu okukhiqiza lapho udinga ukuqapha zonke izinyathelo.

Lesi sihloko sihlanganisa zombili, siqala nge-Qwen-Agent.

# Ukwakha Umsizi Wonjiniyela Wendawo Ye-GitHub

I-ejenti yenza izinto ezine ngokulandelana: ifunda izindaba ezivuliwe kusuka endaweni yokugcina ye-GitHub, ithola ikhodi efanelekile, idwebe ukulungisa, futhi ivula isicelo sokudonsa. Konke endaweni, nge-MCP.

// Ingxenye 1: Imvelo kanye Nokusethwa Kweseva ye-MCP

# Set your GitHub personal access token
# Required by the GitHub MCP server for API calls
export GITHUB_TOKEN=ghp_your_token_here

# Pre-built MCP servers install via npx -- no separate install step
# npx handles this on first use when the agent starts the servers
# Verify npx is available:
npx --version

Dala inkomba yephrojekthi:

mkdir qwen-github-agent
cd qwen-github-agent

// Ingxenye 2: Ukuqaliswa Kwe-Qwen-Agent

Indlela eshesha kakhulu eya kumenzeli osebenzayo. I-Qwen-Agent iphatha iluphu egcwele ngokuzenzakalelayo.

# github_agent_qwenagent.py
# Prerequisites: pip install qwen-agent openai
#   npm / npx must be installed for the MCP servers
#   GITHUB_TOKEN env var must be set
#   Local serving endpoint must be running (see previous section)
#
# How to run:
#   python github_agent_qwenagent.py

from qwen_agent.agents import Assistant

# ── Server configuration ──────────────────────────────────────────────────────

# Point at your local serving endpoint.
# Change the base_url to match whichever server you started:
#   SGLang:  
#   vLLM:    
#   Ollama:  
LLM_CONFIG = {
    "model":     "Qwen/Qwen3.6-35B-A3B",
    "model_server": "
    "api_key":   "EMPTY",           # Local servers do not require a real key

    # Thinking mode sampling params (from the official model card best practices)
    "generate_cfg": {
        "temperature":       0.6,
        "top_p":             0.95,
        "top_k":             20,
        "min_p":             0.0,
        "thought_in_history": True,   # This is the preserve_thinking flag in Qwen-Agent
    },
}

# ── MCP server configuration ──────────────────────────────────────────────────
# Each server key names the server; the value is the stdio launch command.
# Qwen-Agent starts each server as a subprocess and manages the MCP sessions.

MCP_SERVERS = {
    "mcpServers": {
        "filesystem": {
            "command": "npx",
            "args": [
                "-y",
                "@modelcontextprotocol/server-filesystem",
                # Grant the agent access to the current working directory
                # In production, restrict to the specific repository path
                "."
            ]
        },
        "github": {
            "command": "npx",
            "args": ["-y", "@modelcontextprotocol/server-github"],
            "env": {
                # The GitHub MCP server reads this env var for API authentication
                "GITHUB_TOKEN": "${GITHUB_TOKEN}"
            }
        },
    }
}

# ── System prompt ─────────────────────────────────────────────────────────────
SYSTEM_PROMPT = """You are a senior software engineer with full access to a GitHub repository
via MCP tools.

When given a repository and task:
1. List open issues to understand what needs fixing
2. Use filesystem tools to read relevant source files and tests
3. Identify the root cause based on the code and the issue description
4. Write a targeted fix -- minimal changes, no refactoring unrelated to the bug
5. Create a pull request with a clear title and description referencing the issue

Always explain your reasoning at each step. Think through edge cases before writing code.
If you are uncertain about a file's purpose, read it before modifying it."""

# ── Agent setup ───────────────────────────────────────────────────────────────

agent = Assistant(
    llm=LLM_CONFIG,
    name="GitHub Developer Assistant",
    description="Reads issues, fixes bugs, opens pull requests -- locally via MCP.",
    system_message=SYSTEM_PROMPT,
    mcp_servers=MCP_SERVERS,
)

# ── Run the agent ─────────────────────────────────────────────────────────────

def run_agent(task: str):
    """
    Run the agent on a task description and stream the output.
    The agent will make tool calls automatically; Qwen-Agent handles
    the full loop including tool execution and result injection.
    """
    messages = [{"role": "user", "content": task}]

    print(f"Task: {task}n{'─' * 70}")

    # Qwen-Agent's run() is a generator that yields intermediate steps
    # Each yielded message shows a tool call, a tool result, or the final answer
    for response in agent.run(messages=messages):
        # response is a list of messages representing the conversation so far
        # The last message contains the most recent output
        last = response[-1]
        role    = last.get("role", "")
        content = last.get("content", "")

        if role == "assistant" and content:
            # Strip and display the thinking block separately for readability
            import re
            thinking = re.search(r"(.*?)", content, re.DOTALL)
            if thinking:
                print(f"[thinking] {thinking.group(1).strip()[:200]}...")
            clean = re.sub(r".*?", "", content, flags=re.DOTALL).strip()
            if clean:
                print(f"[agent] {clean}")

        elif role == "tool":
            tool_name = last.get("name", "unknown_tool")
            print(f"[tool:{tool_name}] result received")


if __name__ == "__main__":
    run_agent(
        "In the repository myorg/my-api-project, find the open issue about "
        "the login endpoint returning 200 for invalid tokens. Read the relevant "
        "code and tests, fix the bug, and open a pull request."
    )

Isebenza kanjani:

python github_agent_qwenagent.py

// Ingxenye 3: Ukuqaliswa Kokusetshenziswa Kwe-SDK ye-MCP eluhlaza

Kumaqembu adinga ukulawula okugcwele kuwo wonke umlayezo wephrothokholi, ukuphatha iphutha langokwezifiso, ithuluzi ngalinye lokucabanga lokuzama kabusha, nokuloga kocwaningo lwazo zonke izikholi zamathuluzi nemiphumela:

# github_agent_raw.py
# Prerequisites: pip install mcp openai httpx
#   GITHUB_TOKEN env var must be set, local server must be running
#
# How to run:
#   python github_agent_raw.py

import asyncio
import json
import os
import re
from openai import AsyncOpenAI
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

# ── Local serving client ───────────────────────────────────────────────────────
client = AsyncOpenAI(
    base_url="
    api_key="EMPTY",
)

MODEL = "Qwen/Qwen3.6-35B-A3B"

# ── Response processing ───────────────────────────────────────────────────────

def strip_thinking(text: str) -> str:
    """Remove ... blocks. Used when we only need the action."""
    return re.sub(r".*?", "", text, flags=re.DOTALL).strip()

def extract_thinking(text: str) -> str:
    """Extract the content of the thinking block for logging."""
    m = re.search(r"(.*?)", text, re.DOTALL)
    return m.group(1).strip() if m else ""

def process_response(response, preserve_thinking: bool = True) -> dict:
    """
    Process a chat completion response from Qwen3.6.

    Handles two output formats:
    1. Tool call via the API's function_call / tool_calls field (when --tool-call-parser is active)
    2. Tool call embedded in the message content as JSON

    Args:
        response:          The OpenAI-compatible completion response
        preserve_thinking: If True, keep thinking content in output for
                           the next turn's KV cache benefit

    Returns:
        dict with thinking, tool_calls, final_answer, has_tool_calls, is_terminal
    """
    choice  = response.choices[0]
    message = choice.message

    # Path 1: Tool calls in the structured field (preferred -- requires tool-call-parser flag)
    if message.tool_calls:
        tool_calls = [
            {
                "name":      tc.function.name,
                "arguments": json.loads(tc.function.arguments),
                "call_id":   tc.id,
            }
            for tc in message.tool_calls
        ]
        thinking = extract_thinking(message.content or "")
        return {
            "thinking":       thinking if preserve_thinking else "",
            "tool_calls":     tool_calls,
            "final_answer":   "",
            "has_tool_calls": True,
            "is_terminal":    False,
        }

    # Path 2: Tool calls embedded in content text (fallback)
    content = message.content or ""
    tag_matches = re.findall(r"(.*?)", content, re.DOTALL)
    tool_calls = []
    for m in tag_matches:
        try:
            tool_calls.append(json.loads(m.strip()))
        except json.JSONDecodeError:
            pass

    thinking     = extract_thinking(content)
    final_answer = re.sub(r".*?", "", content, flags=re.DOTALL)
    final_answer = re.sub(r".*?", "", final_answer, flags=re.DOTALL).strip()

    return {
        "thinking":       thinking if preserve_thinking else "",
        "tool_calls":     tool_calls,
        "final_answer":   final_answer,
        "has_tool_calls": len(tool_calls) > 0,
        "is_terminal":    len(tool_calls) == 0 and bool(final_answer),
    }

# ── Core agent loop ───────────────────────────────────────────────────────────

async def run_github_agent(task: str, repo: str, max_turns: int = 20):
    """
    Run the GitHub developer assistant agent.

    Connects to filesystem and GitHub MCP servers, discovers their tools,
    and runs the Qwen3.6 agent loop until the task is complete or max_turns reached.
    """
    # Start both MCP servers and establish sessions
    fs_params = StdioServerParameters(
        command="npx",
        args=["-y", "@modelcontextprotocol/server-filesystem", "."],
    )
    gh_params = StdioServerParameters(
        command="npx",
        args=["-y", "@modelcontextprotocol/server-github"],
        env={**os.environ, "GITHUB_TOKEN": os.environ.get("GITHUB_TOKEN", "")},
    )

    async with stdio_client(fs_params) as (fs_read, fs_write), 
               ClientSession(fs_read, fs_write) as fs_session, 
               stdio_client(gh_params) as (gh_read, gh_write), 
               ClientSession(gh_read, gh_write) as gh_session:

        # Initialize both sessions
        await fs_session.initialize()
        await gh_session.initialize()

        # Discover all available tools from both servers
        fs_tools_result = await fs_session.list_tools()
        gh_tools_result = await gh_session.list_tools()

        # Build the OpenAI-format tool list for the model
        all_tools = []
        tool_to_session = {}   # Maps tool name to the MCP session that owns it

        for tool in fs_tools_result.tools:
            all_tools.append({
                "type": "function",
                "function": {
                    "name":        tool.name,
                    "description": tool.description,
                    "parameters":  tool.inputSchema,
                }
            })
            tool_to_session[tool.name] = fs_session

        for tool in gh_tools_result.tools:
            all_tools.append({
                "type": "function",
                "function": {
                    "name":        tool.name,
                    "description": tool.description,
                    "parameters":  tool.inputSchema,
                }
            })
            tool_to_session[tool.name] = gh_session

        print(f"Tools available: {len(all_tools)} ({len(fs_tools_result.tools)} filesystem, "
              f"{len(gh_tools_result.tools)} GitHub)")

        # Build conversation history
        system_prompt = f"""You are a senior software engineer with access to the repository {repo}.
Use the available tools to investigate issues, read code, write fixes, and create pull requests.
Think step by step. Read before you modify. Minimal changes only."""

        messages = [
            {"role": "system",  "content": system_prompt},
            {"role": "user",    "content": task},
        ]

        # ── Agent loop ─────────────────────────────────────────────────────────
        for turn in range(max_turns):
            print(f"n[Turn {turn + 1}]")

            # Call the model
            response = await client.chat.completions.create(
                model=MODEL,
                messages=messages,
                tools=all_tools if all_tools else None,
                tool_choice="auto",
                # Thinking mode sampling params from the official best practices
                temperature=0.6,
                top_p=0.95,
                top_k=20,
                min_p=0.0,
                max_tokens=4096,
                extra_body={
                    # preserve_thinking keeps reasoning context across turns
                    # for KV cache efficiency on long agent sessions
                    "preserve_thinking": True,
                }
            )

            result = process_response(response, preserve_thinking=True)

            if result["thinking"]:
                print(f"[thinking] {result['thinking'][:200]}...")

            # Terminal state -- agent has produced a final answer
            if result["is_terminal"]:
                print(f"n[DONE]n{result['final_answer']}")
                return result["final_answer"]

            # Tool call state -- execute each tool and inject results
            if result["has_tool_calls"]:
                # Append the assistant's message with tool calls to history
                messages.append({
                    "role":       "assistant",
                    "content":    response.choices[0].message.content or "",
                    "tool_calls": response.choices[0].message.tool_calls or [],
                })

                for call in result["tool_calls"]:
                    tool_name = call["name"]
                    tool_args = call.get("arguments", {})
                    call_id   = call.get("call_id", "")

                    print(f"[tool] {tool_name}({json.dumps(tool_args)[:80]}...)")

                    session = tool_to_session.get(tool_name)
                    if not session:
                        result_content = f"Error: tool '{tool_name}' not found"
                    else:
                        try:
                            tool_result = await session.call_tool(tool_name, tool_args)
                            result_content = str(tool_result.content)
                            # Truncate very long results to protect context budget
                            if len(result_content) > 12000:
                                result_content = result_content[:12000] + "n...[truncated]"
                        except Exception as e:
                            result_content = f"Error: {e}"

                    print(f"[result] {result_content[:150]}...")

                    messages.append({
                        "role":        "tool",
                        "content":     result_content,
                        "tool_call_id": call_id,
                        "name":        tool_name,
                    })

        print(f"[WARNING] max_turns ({max_turns}) reached without terminal state")


# ── Entry point ───────────────────────────────────────────────────────────────

if __name__ == "__main__":
    asyncio.run(run_github_agent(
        task=(
            "Find the open issue about the login endpoint returning 200 for invalid tokens. "
            "Read src/auth.py and tests/test_auth.py to understand the bug. "
            "Fix the verify_token function and open a pull request with your changes."
        ),
        repo="myorg/my-api-project",
    ))

Isebenza kanjani:

python github_agent_raw.py

Indlela ye-SDK eluhlaza ikunikeza lokho okufingqiwe kwe-Qwen-Agent: ungabona lonke ucingo lwamathuluzi, yonke imiphumela, nawo wonke umlayezo ofakwe kumlando wengxoxo. I tool_to_session isimemezelo somzila siyindlela eyinhloko; yenza imephu yegama lethuluzi ngalinye kuseshini ye-MCP engumnikazi wayo, ngakho-ke umenzeli angashayela noma yiliphi ithuluzi kunoma iyiphi iseva exhunyiwe ngaphandle kokwazi ukuthi iyiphi iseva ehlinzeka ngakho.

# Ukubhala Iseva Ye-MCP Yangokwezifiso

Amaseva e-MCP akhiwe kusengaphambili aphatha uhlelo lwamafayela kanye ne-GitHub. Uma udinga okuthile okungekho – ukubuza ngesizindalwazi sangaphakathi, ukugoqa i-CI/CD API, usebenzisa amathuluzi okuhlaziya ikhodi – ubhala iseva ye-MCP. Nansi ephelele code_quality iseva edalulayo ruff futhi pytest njengamathuluzi e-MCP.

# code_quality_server.py
# A custom MCP server exposing code quality tools to Qwen3.6.
#
# Prerequisites:
#   pip install mcp ruff pytest
#
# How to run standalone (for testing):
#   python code_quality_server.py
#
# To add to the Qwen-Agent config:
#   "code_quality": {
#       "command": "python",
#       "args": ["/absolute/path/to/code_quality_server.py"]
#   }

import asyncio
import json
import subprocess
import sys
from mcp.server.fastmcp import FastMCP

# FastMCP is a high-level MCP server framework -- reduces boilerplate significantly
mcp = FastMCP("code_quality")


@mcp.tool()
def run_linter(file_path: str, fix: bool = False) -> str:
    """
    Run ruff linter on a Python file and return structured lint results.
    Use this before modifying a file to understand its current quality state,
    and after making changes to verify the fix did not introduce new issues.

    Args:
        file_path: Absolute or relative path to the Python file to lint.
        fix:       If true, automatically fix safe issues in place.

    Returns:
        JSON string with issues list, issue count, and files modified.
    """
    cmd = ["python", "-m", "ruff", "check", file_path, "--output-format=json"]
    if fix:
        cmd.append("--fix")

    try:
        result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
        # ruff returns exit code 1 when issues are found -- not an error
        output = result.stdout or result.stderr

        # Parse ruff's JSON output
        try:
            issues = json.loads(output) if output.strip() else []
        except json.JSONDecodeError:
            issues = []

        formatted = [
            {
                "line":    issue.get("location", {}).get("row", 0),
                "col":     issue.get("location", {}).get("column", 0),
                "code":    issue.get("code", ""),
                "message": issue.get("message", ""),
                "fix_available": issue.get("fix") is not None,
            }
            for issue in issues
            if isinstance(issue, dict)
        ]

        return json.dumps({
            "file":         file_path,
            "issues":       formatted,
            "total_issues": len(formatted),
            "fixed":        "auto-fix applied" if fix else "no auto-fix",
        }, indent=2)

    except subprocess.TimeoutExpired:
        return json.dumps({"error": "Linter timed out after 30s", "file": file_path})
    except FileNotFoundError:
        return json.dumps({"error": "ruff not found -- install with: pip install ruff"})


@mcp.tool()
def run_tests(target: str, verbose: bool = False) -> str:
    """
    Run pytest on a module or directory and return structured pass/fail results.
    Use this after writing a fix to verify the fix makes failing tests pass
    without breaking other tests.

    Args:
        target:  Path to the test file or directory to run (e.g. tests/, tests/test_auth.py)
        verbose: If true, include full pytest output in the result.

    Returns:
        JSON string with pass count, fail count, failure details, and duration.
    """
    cmd = ["python", "-m", "pytest", target, "--json-report", "--json-report-file=-", "-q"]
    if verbose:
        cmd.append("-v")

    try:
        result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
        output = result.stdout

        # Parse pytest-json-report output if available
        try:
            report = json.loads(output)
            summary  = report.get("summary", {})
            failures = [
                {
                    "test":    t["nodeid"],
                    "message": t.get("call", {}).get("longrepr", "")[:500],
                }
                for t in report.get("tests", [])
                if t.get("outcome") == "failed"
            ]
            return json.dumps({
                "target":   target,
                "passed":   summary.get("passed", 0),
                "failed":   summary.get("failed", 0),
                "errors":   summary.get("error", 0),
                "total":    summary.get("total", 0),
                "duration": summary.get("duration", 0),
                "failures": failures,
                "stdout":   result.stdout[:2000] if verbose else "",
            }, indent=2)

        except json.JSONDecodeError:
            # Fallback: return raw output if JSON report not available
            return json.dumps({
                "target":  target,
                "stdout":  result.stdout[:3000],
                "stderr":  result.stderr[:1000],
                "exit_code": result.returncode,
            })

    except subprocess.TimeoutExpired:
        return json.dumps({"error": f"Tests timed out after 120s for target: {target}"})
    except FileNotFoundError:
        return json.dumps({"error": "pytest not found -- install with: pip install pytest"})


if __name__ == "__main__":
    mcp.run(transport="stdio")

Yengeze kunoma iyiphi i-ejenti yokulungiselela iseva:

# In Qwen-Agent MCP_SERVERS dict:
"code_quality": {
    "command": "python",
    "args": ["/absolute/path/to/code_quality_server.py"]
}

# In the raw SDK, add a third StdioServerParameters:
cq_params = StdioServerParameters(
    command="python",
    args=["/absolute/path/to/code_quality_server.py"],
)

Hlola iseva ezimele ngaphambi kokuxhuma i-ejenti:

# Test the server in MCP inspector mode
npx @modelcontextprotocol/inspector python code_quality_server.py
# Opens a browser UI where you can call run_linter and run_tests directly

# Ukushuna Imodi Yokucabanga Nokulondoloza Ukucabanga

Isinqumo semodi yokucabanga sithinta ukubambezeleka kakhulu ngokwanele ukuthi kufanele kuthathwe njengokukhetha kwezakhiwo okusobala, hhayi ukucabanga ngemuva.

Kumodi yokucabanga, i-Qwen3.6 ikhiqiza umkhondo wokucabanga owuchungechunge ngaphakathi ... amathegi ngaphambi kokukhiqiza isenzo sawo. Ngomsebenzi wezinyathelo ezi-5, lowo mkhondo wengeza amathokheni ayi-1,000 kuye kwayi-5,000 ukujika ngakunye kuye ngobunkimbinkimbi bomsebenzi. Lawo mathokheni athatha isikhathi ukukhiqiza nokusebenzisa isabelomali somongo.

Lapho lezo zindleko kufanele zikhokhelwe:

Izinyathelo zokuhlela lapho umenzeli enquma ukuthi enzeni ngokulandelayo.
Izikhathi zokususa iphutha lapho inkinga ingaqondakali khona.
I-Multi-file refactoring lapho i-ejenti idinga ukucabanga mayelana nemiphumela engemihle kuwo wonke amafayela.

Ukulandelela kokucabanga kubamba amaphutha ngaphambi kokuthi abe amakholi amathuluzi anezimpikiswano ezingalungile. Uma kungafanele ukukhokha: izihibe zokubiza ithuluzi lapho isinyathelo ngasinye sicacile — uhlu lwemibhalo → funda ifayela → bhala ifayela → bophezela. Imodeli ayidingi ukucabanga kanzima ngalezi zinyathelo. Imodi yokungacabangi iyashesha futhi ikhiqiza okukhiphayo kwekhwalithi efanayo.

Shintsha amamodi ngesicelo ngasinye, hhayi emhlabeni jikelele:

# Thinking mode (planning, debugging, complex multi-file tasks)
THINKING_PARAMS = {
    "temperature": 0.6,
    "top_p":       0.95,
    "top_k":       20,
    "min_p":       0.0,
}

# Non-thinking mode (mechanical loops, fast status checks)
# Pass enable_thinking=False in the chat template, or use system prompt:
# Add "/no_think" to the system prompt to suppress thinking mode.
NON_THINKING_PARAMS = {
    "temperature": 0.7,
    "top_p":       0.8,
    "top_k":       20,
    "min_p":       0.0,
}

I preserve_thinking ifulegi — amandla akhethekile we-Qwen3.6 agcina umongo wokucabanga kuwo wonke amajika – athinta ngokuqondile ukusebenza kahle kokucabanga lapho ukugcinwa kwesikhashana kwesiqalo kusebenza. Nakhu ukuthi kungani kubalulekile ngempela: kuseshini yomenzeli enamathuba angu-10, ithuba ngalinye labelana ngesiqalo somlando wengxoxo. Nini preserve_thinking=Trueumkhondo ogcwele wokucabanga osuka ekushintsheni kwangaphambili uhlala emlandweni. Inqolobane ye-KV ohlangothini lweseva ibona isiqalo esabiwe kuwo wonke amajika futhi igwema ukuphinda siyibale. Izinga elisebenzayo lamathokheni ngesekhondi ngalinye lamaseshini amade liphezulu ngokuphawulekayo kunangaphandle kwalo, ikakhulukazi lapho kunikezwa ingqalasizinda efana ne-SGLang --enable-prefix-caching iyagijima.

Umthetho osebenzayo: ukusetshenziswa preserve_thinking=True kumaseshini e-ejenti azosebenza izikhathi ezingaphezu kuka-5. Sebenzisa preserve_thinking=False (noma imodi yokungacabangi) yemibuzo yokujika kanye namapayipi amafushane lapho i-overhead iwudoti.

# Isiphetho

Izakhiwo ze-MoE ze-Qwen3.6-35B-A3B zikunikeza ikhwalithi yemodeli engu-35B ngezindleko zokuvula ezingu-3B. Iwindi layo lengqikithi engu-262k likunikeza indawo yokubamba sonke isikhathi sokubuyekezwa kwekhodi kumongo. Ukuqeqeshwa kwayo okucacile kumabhentshimakhi asekelwe ku-MCP kusho ukuthi iyakwazi ukusebenzisa amathuluzi ngendlela efanele, hhayi nje ukuwabiza.

I-MCP inikeza izicubu ezixhumeneyo. Chaza ithuluzi kanye njengeseva ye-MCP. Njalo ngeseshini ye-Qwen3.6 kanye nawo wonke amanye amamodeli ahambisana ne-MCP angayithola futhi ayibize ngaphandle kweglue yangokwezifiso. I-GitHub kanye namaseva esistimu yefayela kulesi sihloko zingamaseva amabili kwamakhulu akhiwe ngaphambili ku-ecosystem ye-MCP. Isiko code_quality iseva ibonisa iphethini yanoma yini engekho.

Umsizi wonjiniyela we-GitHub kulesi sihloko ungolunye uhlelo lokusebenza lwephethini. Isakhiwo esifanayo – imodeli yendawo, amathuluzi e-MCP, ne-agent loop – isebenzela umsizi wocwaningo osesha izizindalwazi zezemfundo futhi abhale ukubuyekezwa kwezincwadi, umenzeli we-DevOps ofunda amalogi e-CloudWatch aphinde avule amathikithi esigameko, noma i-ejenti yamapayipi edatha efunda izikimu ze-SQL, ebhala ikhodi yoshintsho, futhi eqinisekisa imiphumela. I-MCP ecosystem ikhula ngokushesha. Ikhono lemodeli yendawo selivele likhona.

Shithu Olumide ungunjiniyela wesofthiwe nombhali wezobuchwepheshe othanda ukusebenzisa ubuchwepheshe obuphambili ekwenzeni izindaba ezithokozisayo, oneso elibukhali lemininingwane kanye nekhono lokwenza imiqondo eyinkimbinkimbi ibe lula. Ungathola futhi i-Shittu Twitter.

Source link

nimda 3 weeks ago

0 5 16 minutes read