Machine Learning

I-Cursor Empeleni Ikhomba I-Codebase Yakho

Uma izindawo zokuthuthukisa (ama-IDE) zibhangqwe nabasebenzeli bekhodi, kungenzeka ukuthi ubone iziphakamiso zekhodi nokuhlela okunembe ngokumangalisayo nokuhambisanayo.

Leli zinga lekhwalithi nokunemba livela kuma-ejenti asekelwe ekuqondeni okujulile kwe-codebase yakho.

Thatha i-Cursor njengesibonelo. Kwe Inkomba & Amadokhumenti ithebhu, ungabona isigaba esibonisa ukuthi i-Cursor isivele “igwinyile” futhi yakhomba isisekelo sekhodi yephrojekthi yakho:

Isigaba se-Indexing & Amadokhumenti kuthebhu ethi Izilungiselelo Zekhesa | Isithombe ngumbhali

Ngakho sakha kanjani ukuqonda okuphelele kwe-codebase kwasekuqaleni?

Emnyombeni wayo, impendulo ithi isizukulwane sokutholwa-engeziwe (i-RAG)umqondo abafundi abaningi okungenzeka ukuthi sebewujwayele kakade. Njengamasistimu amaningi asekelwe ku-RAG, lawa mathuluzi athembele kuwo search semantic njengekhono elibalulekile.

Kunokuba uhlele ulwazi ngombhalo ongahluziwe kuphela, i-codebase iyakhonjwa futhi ibuyiswe ngokusekelwe encazelweni.

Lokhu kuvumela imibuzo yolimi lwemvelo ukuthi ilande amakhodi afaneleka kakhulu, lawo abenzeli bekhodi abangase bawasebenzise ukuze bacabange, balungise, futhi benze izimpendulo ngempumelelo kakhudlwana.

Kulesi sihloko, sihlola i- Ipayipi le-RAG ku-Cursor evumela ama-ejenti okubhala ikhodi ukuthi enze umsebenzi wawo esebenzisa ukuqwashisa ngokomongo kwe-codebase.

Okuqukethwe

(1) Ukuhlola Ipayipi le-Codebase RAG
(2) Ukugcina I-Codebase Index isesikhathini samanje
(3) Eyisonga


(1) Ukuhlola Ipayipi le-Codebase RAG

Ake sihlole izinyathelo kuphayiphi le-RAG le-Cursor ukuze kufakwe inkomba nokwenza izisekelo zekhodi zibe yingqikithi:

Isinyathelo soku-1 – ukugoqa

Emigqeni eminingi ye-RAG, kufanele siqale siphathe ukulayishwa kwedatha, ukucutshungulwa kwangaphambili kombhalo, kanye nokwehlukanisa amadokhumenti emithonjeni eminingi.

Kodwa-ke, lapho usebenza ne-codebase, iningi lalo mzamo lingagwenywa. Ikhodi yomthombo isivele ihlelwe kahle futhi ihlelwe ngokuhlanzekile ngaphakathi kwerepo yephrojekthi, esivumela ukuthi seqe ukuncozululwa kwedokhumenti evamile futhi siqonde ngqo ekuhlanganiseni.

Kulo mongo, umgomo wokuhlanganisa ukuhlephula ikhodi amayunithi anengqondo, ahambisanayo ngokwemisho (isb, imisebenzi, amakilasi, namabhulokhi ekhodi anengqondo) kunokuhlukanisa umbhalo wekhodi ngokungafanele.

Ukuhlukaniswa kwekhodi ye-Semantic kuqinisekisa ukuthi ingxenye ngayinye ithwebula ingqikithi yesigaba esithile sekhodi, okuholela ekubuyiseni okunembe kakhudlwana nasekukhiqizeni okuwusizo phansi komfula.

Ukwenza lokhu kuphatheke kakhudlwana, ake sibheke ukuthi i-code chunking isebenza kanjani. Cabangela isibonelo esilandelayo sombhalo wePython (ungakhathazeki ngokuthi ikhodi yenzani; okugxilwe kukho lapha esakhiweni sayo):

Ngemva kokufaka i-code chunking, iskripthi sihlukaniswa ngokuhlanzekile sibe izingcezu ezine ezinencazelo ehlelekile nebumbene:

Njengoba ubona, izingcezu zinengqondo futhi zihambisana nomxholo ngoba zihlonipha i-code semantics. Ngamanye amazwi, ukuhlanganisa kugwema ukuhlukanisa ikhodi phakathi kwebhulokhi enengqondo ngaphandle uma kudingwa yizingqinamba zosayizi.

Empeleni, kusho ukuthi ukuhlukaniswa kwe-chunk kuvame ukudalwa phakathi kwemisebenzi kunangaphakathi kuyo, naphakathi kwezitatimende esikhundleni somugqa omaphakathi.

Isibonelo esingenhla, ngisebenzise Chonkieuhlaka olulula lomthombo ovulekile oludizayinelwe ngokuqondile ukuhlanganisa amakhodi. Inikeza indlela elula nesebenzayo yokusebenzisa ikhodi chunking, phakathi kwamanye amasu amachunki amaningi atholakalayo.


[Optional Reading] Ngaphansi kweHood of Code Chunking

Ukuhlukaniswa kwekhodi ngenhla akukona ngephutha, futhi akufinyelelwa ngokuhlukanisa ngekhodi ngokungenangqondo kusetshenziswa izibalo zezinhlamvu noma izinkulumo ezivamile.

Iqala ngokuqonda i-syntax yekhodi. Inqubo ngokuvamile iqala ngokusebenzisa umhlahleli wekhodi yomthombo (njenge umhlali wesihlahla) ukuguqula ikhodi eluhlaza ibe yi- isihlahla se-syntax esingabonakali (AST).

Isihlahla se-syntax esingabonakali siwumfanekiso wekhodi omise okwesihlahla othwebula ukwakheka kwawo, hhayi umbhalo wangempela. Esikhundleni sokubona ikhodi njengeyunithi yezinhlamvu, isistimu manje isiyibona njengamayunithi anengqondo ekhodi njengemisebenzi, amakilasi, izindlela, namabhulokhi.

Cabangela umugqa olandelayo wekhodi yePython:

x = a + b

Kunokuba ithathwe njengombhalo ongenalutho, ikhodi iguqulwa ibe isakhiwo somqondo kanje:

Assignment
├── Variable(x)
└── BinaryExpression(+)
├── Variable(a)
└── Variable(b)

Lokhu kuqonda kwesakhiwo yikho okuvumela ukuhlanganisa amakhodi ngempumelelo.

Ukwakhiwa kwekhodi ngakunye okunenjongo, okufana nomsebenzi, ibhulokhi, noma isitatimende, imelelwa njenge i-node esihlahleni se-syntax.

Umdwebo oyisampula wesihlahla esilula se-syntax | Isithombe ngumbhali

Esikhundleni sokusebenza kumbhalo ongahluziwe, i-chunking isebenza ngokuqondile esihlahleni se-syntax.

I-chunker izonqamula lawa ma-node kanye namaqembu aseduze ndawonye kuze kube yilapho kufinyelelwa umkhawulo wethokheni, ikhiqize izingcezu ezihambisana ngokwemantiki futhi ezinomkhawulo wosayizi.

Nasi isibonelo sekhodi eyinkimbinkimbi kakhudlwana kanye nesihlahla se-syntax esihambisanayo:

while b != 0:
    if a > b:
        a := a - b
    else:
        b := b - a
return 
Isibonelo se-abstract syntax mahhala | Isithombe sisetshenziswe ngaphansi kwe-Creative Commons

Isinyathelo sesi-2 – Ukukhiqiza Okushumekiwe kanye Nemethadatha

Uma izingcezu sezilungisiwe, imodeli yokushumeka isetshenziswa ukuze kukhiqizwe ukumelwa kwe-vector (okushumekiwe) kusiqephu sekhodi ngasinye.

Lokhu kushumeka kuthwebula incazelo ye-semantic yekhodi, okuvumela ukubuyiswa kwemibuzo yabasebenzisi kanye nokwaziswa kokukhiqiza ukuthi kuhambisane nekhodi ehlobene nesemantically, ngisho noma amagama angukhiye anembile engadluleli.

Lokhu kuthuthukisa kakhulu ikhwalithi yokubuyiswa kwemisebenzi efana nokuqonda ikhodi, ukulungisa kabusha, nokulungisa iphutha.

Ngaphandle kokukhiqiza okushumekiwe, esinye isinyathelo esibalulekile enothisa ingxenye ngayinye ngemethadatha efanele.

Isibonelo, imethadatha efana ne- indlela yefayela kanye nebanga lomugqa wekhodi elihambisanayo ingxenye ngayinye igcinwa eduze kwe-vector yayo yokushumeka.

Le methadatha ayinikezi nje kuphela umongo obalulekile mayelana nokuthi isiqephu sivelaphi, kodwa futhi inika amandla ukuhlunga kwegama elingukhiye okusekelwe kumethadatha ngesikhathi sokuthola.


Isinyathelo sesi-3 – Ukuthuthukisa Ubumfihlo Bedatha

Njenganoma iyiphi isistimu esekelwe ku-RAG, ubumfihlo bedatha yinto eyinhloko ekhathazayo. Lokhu ngokwemvelo kuphakamisa umbuzo wokuthi noma izindlela zefayela ngokwazo zingaqukatha ulwazi olubucayi.

Empeleni, amagama efayela nawemibhalo ngokuvamile embula okungaphezu kwalokho obekulindelekile, njengezakhiwo zangaphakathi zephrojekthi, amagama ekhodi yomkhiqizo, izihlonzi zamaklayenti, noma imingcele yobunikazi ngaphakathi kwesisekelo sekhodi.

Ngenxa yalokho, izindlela zefayela zithathwa njengemethadatha ebucayi futhi zidinga ukuphathwa ngokucophelela.

Ukuze kubhekwane nalokhu, i-Cursor iyasebenza indlela yefayela obfuscation (i-aka path masking) ohlangothini lweklayenti ngaphambi kokuba noma iyiphi idatha idluliselwe. Ingxenye ngayinye yendlela, ihlukaniswe / futhi .imbozwa kusetshenziswa ukhiye oyimfihlo kanye ne-nonce encane engashintshi.

Le ndlela ifihla igama langempela lefayela namafolda kuyilapho ilondoloza ukwakheka kohla lwemibhalo olwanele ukusekela ukubuyisa nokuhlunga okuphumelelayo.

Ngokwesibonelo, src/payments/invoice_processor.py ingase iguqulwe ibe a9f3/x72k/qp1m8d.f4.

Qaphela: Abasebenzisi bangalawula ukuthi yiziphi izingxenye ze-codebase yabo ezabelwa i-Cursor ngokusebenzisa a .cursorignore ifayela. I-Cursor yenza umzamo ongcono kakhulu wokuvimbela okuqukethwe okusohlwini ukuthi kungadluliswa noma kubhekiselwe kuzo ezicelweni ze-LLM.


Isinyathelo 4- Ukugcina Okushumekiwe

Uma sekwenziwe, ukushumeka kwe-chunk (nemethadatha ehambisanayo) kugcinwa kusizindalwazi se-vector kusetshenziswa. I-Turbopufferelungiselelwe ukusesha okusheshayo kwe-semantic ezigidini zamakhodi ezinhlamvu.

I-Turbopuffer iyinjini yokusesha engenaseva, esebenza kahle kakhulu ehlanganisa i-vector nokusesha okugcwele umbhalo futhi isekelwa ukugcinwa kwento ebiza kancane.

Ukuze kusheshiswe ukuphinda kufakwe ohlwini, ukushumeka kuphinde kugcinwe kunqolobane ku-AWS futhi kufakwe ukhiye nge-hashi yesigaxa ngasinye, okuvumela ikhodi engashintshiwe ukuthi iphinde isetshenziswe kukho konke ukwenziwa kwenkomba okulandelayo.

Ngokombono wobumfihlo bedatha, kubalulekile ukuqaphela lokho okushumeka kuphela nemethadatha egcinwe emafini. Kusho ukuthi ikhodi yethu yomthombo yasekuqaleni ihlala emshinini wethu wasendaweni futhi kunjalo akukaze kugcinwe kumaseva e-Cursor noma ku-Turbopuffer.


Isinyathelo sesi-5 – Ukuqalisa Usesho Lwe-Semantic

Uma sithumela umbuzo Ku-Cursor, uqala ukuguqulwa ube ivekhtha kusetshenziswa imodeli yokushumeka efanayo yokukhiqiza ukushumeka kwe-chunk. Iqinisekisa ukuthi yomibili imibuzo nezigaxa zekhodi zihlala endaweni efanayo ye-semantic.

Ngokombono wokusesha kwe-semantic, inqubo ihamba kanje:

  1. Ikhesa iqhathanisa ukushumeka kombuzo nokushumekwa kwekhodi kusizindalwazi se-vector ukuze kuhlonzwe izingxenyana zekhodi ezifanayo kakhulu.
  2. Lezi zingxenye zekhandidethi zibuyiswa yi-Turbopuffer nge-oda elilinganiselwe ngokusekelwe kuzikolo zabo ezifanayo.
  3. Njengoba ikhodi yomthombo eluhlaza ingalokothi igcinwe emafini noma kusizindalwazi se-vector, imiphumela yosesho iqukethe kuphela imethadatha, ikakhulukazi imizila yefayela efihliwe kanye nobubanzi bemigqa yekhodi ehambisanayo.
  4. Ngokuxazulula imethadatha yezindlela zamafayela asuswe ukubethela nobubanzi bemigqa, iklayenti lendawo likwazi ukubuyisa izingcezu zangempela zekhodi ku-codebase yendawo.
  5. Izingxenye zekhodi ezibuyisiwe, ngendlela yombhalo wayo wangempela, zibe sezinikezwa njengomongo eceleni kombuzo oya ku-LLM ukuze kukhiqizwe impendulo eqaphela umongo.

Njengengxenye yokusesha okuyingxubevange (semantic + igama elingukhiye), umenzeli wekhodi angaphinda asebenzise amathuluzi afana grep futhi ripgrep ukuthola amazwibela ekhodi asuselwe ekufanisweni kweyunithi yezinhlamvu ngqo.

I-OpenCode wuhlaka oludumile lwe-ejenti yomthombo ovulekile wokufaka ikhodi etholakala kutheminali, ama-IDE, nasezindaweni zedeskithophu.

Ngokungafani ne-Cursor, isebenza ngokuqondile ku-codebase isebenzisa ukusesha umbhalo, ukufanisa ifayela, nokuzulazula okusekelwe ku-LSP kunokusesha okusekelwe ku-semantic.

Njengomphumela, i-OpenCode ihlinzeka ngokuqwashisa okuqinile kwesakhiwo kodwa ayinawo amandla ajulile okubuyisa i-semantic atholakala ku-Cursor.

Njengesikhumbuzo, ikhodi yethu yomthombo yokuqala ithi hhayi igcinwe kumaseva e-Cursor noma ku-Turbopuffer.

Kodwa-ke, lapho uphendula umbuzo, i-Cursor isadinga ukudlulisa izingcezu zekhodi yoqobo enkampanini yekhodi ukuze ikhiqize impendulo enembile.

Lokhu kungenxa yokuthi ukushumeka kwe-chunk akukwazi ukusetshenziselwa ukwakha kabusha ngokuqondile ikhodi yoqobo.

Ikhodi yombhalo ongenalutho ibuyiswa kuphela ngesikhathi sokunquma futhi kuphela ngamafayela athile nemigqa edingekayo. Ngaphandle kwalesi sikhathi sokusebenza esifushane, i-codebase ayigcinwa noma iqhutshekiselwe kude.


(2) Ukugcina I-Codebase Index isesikhathini

Uhlolojikelele

I-codebase yethu ishintsha ngokushesha njengoba samukela ukuhlelwa okukhiqizwa umenzeli noma njengoba senza izinguquko zekhodi mathupha.

Ukuze ugcine ukubuyiswa kwe-semantic kunembile, i-Cursor ivumelanisa ngokuzenzakalelayo inkomba yekhodi ngokuhlolwa kwezikhathi, ngokuvamile njalo emizuzwini emihlanu.

Phakathi nokuvumelanisa ngakunye, isistimu ithola izinguquko ngokuvikelekile futhi ivuselele amafayela athintekile kuphela ngokukhipha ukushumeka okudlulelwe yisikhathi futhi ikhiqize amasha.

Ngaphezu kwalokho, amafayela acutshungulwa ngamaqoqo ukuze kuthuthukiswe ukusebenza futhi kuncishiswe ukuphazamiseka ekuthuthukisweni komsebenzi wethu.

Ukusebenzisa Izihlahla ze-Merkle

Ngakho-ke i-Cursor iwenza kanjani lo msebenzi ngaphandle komthungo? Iskena ifolda evuliwe futhi ibale isihlahla se-Merkle samahashi wefayelaokuvumela isistimu ukuthi ithole ngempumelelo futhi ilandelele izinguquko kuyo yonke i-codebase.

Kulungile, manje yini isihlahla se-Merkle?

Kuwuhlaka lwedatha olusebenza njengesistimu yezigxivizo zeminwe zedijithali, okuvumela izinguquko kusethi enkulu yamafayela ukuthi alandelelwe kahle.

Ifayela ngalinye lekhodi liguqulelwa ekubeni izigxivizo zeminwe ezimfushane, futhi lezi zigxivizo zeminwe zihlanganiswa ngokulandelana kwezigaba zibe isigxivizo somunwe esisezingeni eliphezulu esimelela yonke ifolda.

Uma ifayela lishintsha, izigxivizo zeminwe kuphela kanye nenani elincane lezigxivizo zeminwe ezihlobene okudingeka zibuyekezwe.

Umfanekiso wesihlahla se-Merkle | Isithombe sisetshenziswe ngaphansi kwe-Creative Commons

Isihlahla se-Merkle se-codebase sivunyelaniswa kuseva Yekhesa, ehlola ngezikhathi ezithile ukungafani kwezigxivizo zeminwe ukuze ibone ukuthi yini eshintshile.

Ngenxa yalokho, ingakhomba ukuthi yimaphi amafayela ashintshiwe futhi ibuyekeze kuphela lawo mafayela phakathi nokuvumelanisa kwenkomba, igcine inqubo ishesha futhi isebenza kahle.

Ukuphatha Izinhlobo Ezihlukene Zamafayela

Nansi indlela i-Cursor ephatha ngayo izinhlobo ezahlukene zamafayela ngempumelelo njengengxenye yenqubo yokukhomba:

  • Amafayela amasha: Kwengezwe ngokuzenzakalelayo kunkomba
  • Amafayela ashintshiwe: Okushumekiwe okudala kususiwe, kwakhiwe okusha
  • Amafayela asusiwe: Ikhishwe ngokushesha kunkomba
  • Amafayela amakhulu/ayinkimbinkimbi: Ingase yeqiwe ngenxa yokusebenza

Qaphela: Inkomba yekhodibase yekhesa iqala ngokuzenzakalelayo noma nini lapho uvula indawo yokusebenza.


(3) Ukusonga

Kulesi sihloko, sibheke ngale kwesizukulwane se-LLM ukuze sihlole ipayipi elingemuva kwamathuluzi afana ne-Cursor eyakha umongo olungile nge-RAG.

Ngokuhlanganisa amakhodi emingceleni enengqondo, ukuyifaka ohlwini ngendlela efanele, futhi ngokuqhubekayo ngokuvuselela lowo mongo njengoba i-codebase ishintsha, ama-ejenti okubhala ayakwazi ukuletha iziphakamiso ezifanele kakhulu nezithembekile.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button