ANI

Tools of 10 Command-line All data scientist should know

Tools of 10 Command-line All data scientist should know
Photo by the writer

Obvious Introduction

Although the modern science of data will receive JYSter blocks, pandas, clicking Deserts, they do not reside to give you a control rate you can need. On the other hand, the Command-line command may not be powerful as you wish, but they are powerful, but powerful, and fast in specific activities.

Through this article, I tried to create a balance between use, maturity and power. You will find some unavoidable classics, as well as the addition of modern completion or efficiency. You can call you this 2025 CLI toolbar version. For those who are not familiar with CLI tools but they want to read, add a bonus class and resources to the conclusion, so scroll all the way before including these tools.

Obvious 1. curl

grind My movement of http applications are like finding, post, or setting; Downloading files; and sending / receiving data over the protocols such as http or FTP. Ready to retrieve data from apis or download information, and can easily mix with data entry pipes to pull JSON, CSV, or other payment. The good thing about curl is installed in front of unix systems, so you can start using it instantly. However, its syntax (especially around titles, repayment, and verification) can be a vermase and an error. If you interact with the most commonly complex APIs, you would like to null the use of the use or library of Python, but to know the curl is still an important device for testing and correcting error.

Obvious 2. JQ

kind It is a simple JSS processor allows you to ask, Sort, change, and be happy – print JSON data. With JSON is a large apple format, logs, and data exchange, jq is very important for issuing and recycling JSON in pipes. It works as “Pandas for JSON in the shell.” The biggest benefit is that it provides a short language of dealing with JSON, but learning its syntax may take time, and the largest JSON files may need additional care with memory management.

Obvious 3. CSVKIT

CSVKIT The SUITE of the CSV-Center Command-line-line reticles for converting, sorting, integrating, joining, and testing CSV files. You can also select the columns, subset lines, mix multiple files, convert from one format to another, and continue questions such as CSV data. The CSVKIT understands the Semantics of the CSV Quop in Semantics and topics, making it safe as the use of the text. Python performance support means that in the SAG GOUT MADETS, while some complex questions can be easy for Patters or SQL. If you choose the speed and use of active memory, think about CSVTK Toolkit.

Obvious 4. QWK / SED

Link (SED): https://www.gnu.org/Software/sed/Manual/sed.html
Classic Unix tools are like Praise including stop remain unreachable with cheating text. AWK has power to scan the pattern, field-based conversion, and fast integration, while sex seclels in the text, removal, and conversion. These tools are faster and survive, making them effective pipe work. However, their syntax cannot be seen. As a logic grows, logical reading, and you may move into a scripting tongue. Also, for the prescribed or hierarchical data (eg Nest Jerson, these tools have limited display.

Obvious 5. Similarities

Gna parallel Accelerates work flow through many processes similarly. Many data functions “is amazing” in all data drawings. Suppose you should do the same change in the same factor-parallel files that can distribute the work to all CPU cores, accelerate the processing, and treat work control. It should, however, be reasonable for the I / O Bottlenecks and the system's responsibility, and quotation / rest can be difficult piptures. With cluster-scale or job-related work, think about the resources of the resources (eg spark, dask, in Bernetes).

Obvious 6. RIPGGREP (RG)

SPREPT (rg) A quick search tool for a speedy designed and effective. It respects .gitignore Automatically and ignores hidden files or binary files, making it more faster than traditional grep. Perfect speed searches for all codes, log directions, or editing files. Because it is automatic to ignore certain ways, you may need to change the flags to search everything, and it is not always available for all platforms.

Obvious 7. Datamash

porch provides numerical, literary, and mathematics (sum, means, Median, party, etc.) directly in shell with stdin or files. Ingasindi futhi ilusizo ekuhlanganiseni okusheshayo ngaphandle kokwethula ithuluzi elisindayo njengePython noma r, elenza lilungele i-etl-based etl noma ukuhlaziya okuhlola. But they are not made for the largest datasets or analytics, where special tools do better. Also, combining the highest cards may require great memory.

Obvious 8 .lop the

Leather Is the active monitor program and the viewer of the process that provides live insight into CPU, memory, and use of each process. When you run heavy pipes or model training, the lolop is very helpful in pursuing the use of resources and identification bottles. Easy is more efficient than cultural topBut cooperation means it is not well in line with automated texts. It may also lose low-level Services, and does not mean that specialized tool tools (profiles, metric dashboards).

Obvious 9. git

sace Is the distribution division control system important in tracking changes in the code, documents, and small data assets. Recycling, interactions, branch exams, and reconciliation, GIT is a standard. It meets with submission pipes, CD tools, and booklets. Its drain is not intended for the specification of large binary data, where Git LFS, DVC, or special specialist systems. The flow of the branch function and integration will also come with the curve.

Obvious 10. TMUX / Screen

Many two are like TMUX including screen Allow multiple deadly sessions in one window, up with repeated sessions, and restart the work after the SSH and cross. They are important if you need to use long tests or pipes remotely. While the TMUX is recommended due to its effective development and flexibility, config and keystrums that can deceive newcomers, and small areas may not be automatically installed.

Obvious Rolling up

If you start, I would recommend that I know the best “Core Four”: Curl, JQ, AWK / SED, and Git. This is used everywhere. Over time, you will find the clips to be like SQL customers, the DuckdB clior Hygiene slipping in your work travel. To learn more, check the following resources:

  1. Data Science in the Roman line of command of Jeroen Janssens
  2. The art of the command line in GitTub
  3. Mark Pearl Chearleet
  4. Unix & Command-line submeddids are usually helpful and new tools and tools to expand your shopping box later.

Kanal Mehreen Are the engineering engineer and a technological author interested in the biggest interest of data science and a medication of Ai and medication. Authorized EBOOK “that added a product with chatGPT”. As a Google scene 2022 in the Apac, it is a sign of diversity and the beauty of education. He was recognized as a Teradata variation in a Tech scholar, Mitacs Globalk scholar research, and the Harvard of Code Scholar. Kanalal is a zealous attorney for a change, who removes Femcodes to equip women to women.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button