Machine Learning

Data does not have moat! | Looking at the data science

Of the AI ​​projects and the data, the importance of data and quality have been recognized that it is important to the project success. Some may even say that projects are used to have one point of failure: Data!

Disappointing “Trash is in, trash out” It was probably the first talk that takes the data industry in a hurricane (in the data “machines” for new oils “). We all knew that data was not properly organized, cleaned, the results of any analysis and applications that may be incorrect and incorrect.

For that reason, in time, many lessons and researchers focus on explaining data quality pillars and which Metrics may be used to test.

The 1991 research paper has featured the quality of the quality of various data, all are widely aligned with the main focus and use of data on time – organized data. Soon in 2020, Data quality research paper (DDQ), pointed to an amazing number of data quality structures (about 65! And how the data itself used.

Data Quality size: Looking at Quality Data with Design, 1991 Wang

However, with the increasing hypertiary Hype, the idea that data quality is no longer lying on the minds of the Savvy engineer. The desire to believe that only engineers and engineers are not enough to bring strong solutions that have been around for a while. Happily for us, Special Information Specialists2021/202 marked the rise of Data-Centric AI! This concept is not far from classic “Trash is in, outgoing trash”It emphasizes the idea that ai development is, if we manage data as an equation that requires the implementation of what you need, we will get better performance and effects, not all of the hyperparameter tuning.

So why do we hear the Duta rumors without Moat?!

The great power of the models' (lls) is the ability to form the thought of the support. Because they are trained in a large organization linked to the power of GPUS, llms cannot produce good content, but the content that is actually able to match our word and our way of thinking. Because they do so well, and often in small context, this led many to him by courageous conclusion:

“Data doesn't have moat.”
“We no longer need data to separate.”
“Just use a better model.”

Does the data quality stand for an opportunity against LMM's and AI agents?

In my opinion – exactly yes! In fact, without current beliefs that detail does not reveal the difference in llms and AI Atets Age, the data remains important. I will even challenge the ability to be skilled and responsible for the potential, their dependence on the good data is more criticism!

So, why is data quality still important?

It first literally, trash, trash outside. It does not matter how much your models and agents are getting if they cannot tell the difference between good and bad. If bad data or low quality insert into model, you will find incorrect answers and misleading consequences. Llms produce models that produce, which means, in the end, they simply give birth patterns to meet them. What is more focused on that we have relied on many places, leading to a misleading consequences.

In addition, these models do not have true land awareness, in the same way in other models produced before. If something is out of time or prejudice, they just won't see it, unless they are trained to do that, and that begins with higher, guaranteed and carefully cut.

The most, when it comes to AI agents, they often rely on tools such as memory or retrieving documents to work in all activities, the importance of good data is very obvious. If their knowledge is based on dishonest information, they will not be able to make good decisions. You will find the answer or outcome, but that doesn't mean it is helpful!

Why is the data still moat?

While obstacles prefer combination, storage capacity, storage capacity, and special technologies are referred to as you have experienced a future competition controlled by the agents agents and the most common agents. Here is:

  1. Access is the power
    In restricted or relevant or relating to the media, such as health care, lawyers, business flow, user's data, agents that are not formed only by those receiving the right data. Without it, advanced apps will be flying blind.
  2. Public Web will not be enough
    Free and most public data ends, not because it is no longer available, but because its quality ends quickly. The highest quality datasets are highly generated with the production information, and the other remaining in the paywall or protected API restrictions.
    In addition, a large platform is increasingly closing access to money.
  3. Data poisoning is a new attack vector
    As the acceptance of the foundation models grow, attacks to change from the model code to training and good modeling system itself. Why? It's easy to do and hard to find!
    We get into a period where enemies when enemies should not break the program, they just need to pollute information. From a bad intelligence of labeling, data poisoning is real is the reality organizations that look to accepting agents ai, will require ready. Management Data Origin, Pipe, and integrity is now important to developing a reliable AI.

What are Delayed Data Ai Techniques?

Continuing before new creation, we should also think how to heal data. Data is no longer a process but the primary infrastructure of AI. Building and sending AI is about Code and Algorithms, but also Liffectycle Need Liffectle: How they are collected, cleaned and cleaned, and very important. So, what are the plans we have not made to better use data?

  1. Data management as primary infrastructure
    Treat data through the same compliance with importance as you can get infrastructure or security. This means great tolerance, use access controls, and ensure data flow is followed and evident. Preparation organizations of organizations prepared when data is purposeful, not after another.
  2. Working methods of data quality
    Your data quality explains that you are reliable and work with your faithful faithful agents and work with it! Establish the pipes that recognize records or divertible records, forcing the label levels, and is cautious by driving or contamination. Data engineering is time for future and support to AI. Data do not need to be only collection but most importantly, selected!
  3. Data for being done to fill in the blanks and maintenance
    When Real data is limited, prejudiced, or critical privacy, the performance data provides another powerful way. From the killing model production model, the performance data allows you to create high quality datasets to train models. It is important in turning on situations where the ground of the ground is expensive or restricted.
  4. Data Caution Description
    Security at AI now starts with a layer of data. Implementations such as Source Verification, Verification, and Real Verification of Sexual Pullment and Concern. Not only in datacources but also any applications that apply. This is very important to learning programs from user installation or foreign data.
  5. Data Recesback Loops
    Data should not be seen as consistent in your AI program. It should be able to appear and change over time! LOOPS Feed are obliged to create a sense of evolution when it comes to data. When a strong quality filters, these lilops make your AI solutions unpleasant and alternative later.

In short, data is the MOOO and the future of AI solution. Data-Centric AI is more important than ever, even if the hype means different. So, should ai be hype? The systems only reach the product that can see beyond met.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button