Machine Learning

How can you advertise the llM context to enhance skills

For a large corpus of text data, there, during the pre-training training, they actually use all the Internet. The llms flourishes when access to all the relevant information to answer users' questions correctly. However, in many cases, we limit our LLM skills not to provide sufficient information. In this article, I will discuss why you should take care of our feeding of our LLM data on top, how to download this data, and certain programs.

I will also start with a new feature in my affairs: writing my primary goal, which I want to achieve, and what you should know after reading it. When I succeed, I will start writing it to my topic:

My goal In this article to highlight the importance of providing the llMS with appropriate data, and how you can compel your llm to improve the performance

In this article, I highlight how you can increase the performance of a lot of data in your llms. Photo by ChatGPT.

You can also read my articles of analysis and expand your llms in 3 steps and write an QA using multimodal llms.

Content

Why add additional data to llms?

I will start my article by identify why it is important. The llms is a very hungry data, which means it requires a lot of data to use. This is a common display in pre-training corpus, which contain billions of text tokens are used to train the LLM.

Andrej Karpathy Tweet with data used for llms.

However, the idea of ​​using a lot of data and works on llms during the measurement (when using the llm in production). You need to provide a llm with all the necessary data to respond to the user's request.

In many cases, you misundantly lowered the Illm performance by not providing proper information.

For example, if you make a question response plan, where users can download files and talk to them. Naturally, it provides the content of each file for the user to communicate with this document; However, for example, you forgot to add an Information names In the documents in the user's approaching interview. This will contribute to the operations of the llm, for example, if more information is only there in the file name or the user's trust files name in the conversation. Some special llm apps where additional data is helpful is:

  • To schedule a particular type
  • Release of information
  • Searching the keyword to find the correct documents to feed on the llm

Throughout the article, I will talk where you can find such information, strategies to retrieve additional information, and certain data charges.

At this stage, I will discuss the data that you may already have in your system. One example is my last comparison, where you have a question to answer the file question, but forget to enter the file name in the context. Some examples are:

  • File extensions (.pdf, .docx, .xlsx)
  • Folda Way (if the user uploading a folder)
  • TIMESTAMP (For example, if the user asks about the most recent document, this is required)
  • Page numbers (user may request a llm to download some information on page 5)
Metadata Ways
This picture highlights different Metadata types of access, in file types, folder methods, tilestamp, and pages. Photo by Google Gemini.

Nothagan of other such examples of data may already be found, or you can download quickly and add to your llM context.

The type of data you have will be very different from the app to apply. The many examples I have given in this article complies with AI based on writing, because the space I spend too much time. Anyway, if you, for example ai or ai based on AI, I urge you to get the same examples in your space.

With a visible AI, either:

  • Location data where the photo / video is taken
  • Photo file name / video file
  • Photo / video file registrar

Or noise ai, either

  • Metadata about who is talking when
  • Times of the time of each sentence
  • Local data from where sound

My point is, there is a plethora of the data available there; All you need to do is search for you and look at how useful your application can be.

In some cases, the data you already have is insufficient. You want to provide your llm with more data to help them answer questions correctly. In this case, you need to retrieve additional information. Naturally, as we are in the LLMS years, we will use the LLMs to pick up this data.

Returning Information in advance

The easiest way to get additional data by grabing it before you process any audit applications. Scripture AI, this means to remove certain information from the documents during the processing. You might get rid of Document type (Legal documents, tax documentation, or manual) or details contained in the document (Days, Names, Places, …).

The benefit of downloading information before the time is:

  • Speed ​​(in production, you need to download the value from your database)
  • You can take advantage of batch processing to reduce costs

Today, crying for this kind of information is simple. You set a LLM for a specific program to download information, and feed quickly and text into the LLM. The llm will then process the text and issue the relevant information. You may want to think about your Information Extraction performance, where you can read my article by checking 5 million llm applications for automated conditions.

You may also want to calculate all the information points to find, for example:

Once you have created this list, you can find your entire metadata and keep it in the database.

However, the highest level of downloading information is advance that you must decide in advance of the issuing information. This is difficult in many cases, where you can make live return information, which I set in the following paragraph.

Restoration of Information Information

When you will not decide what returns for advance, you can download it with demand. This means to set up a regular job that takes the data point to remove and the text you can download. Eme For example

import json
def retrieve_info(data_point: str, text: str) -> str:
    prompt = f"""
        Extract the following data point from the text below and return it in a JSON object.

        Data Point: {data_point}
        Text: {text}
        
        Example JSON Output: {{"result": "example value"}}
    """

    return json.loads(call_llm(prompt))

Describing this function as a tool for your llm access, and when you can call whenever you need information. In fact, anthropic system has also organized their deep research program, where one orchestrator agent may catch lower agents to download more information. Note that you can provide your LLM access to use additional productions may result in a lot of use of the token, so you have to pay attention to the money you used for the LLM's Token.

So far, I have negotiated why you should use additional data and how to catch. However, in order to fully understand the content of this article, I will also give certain programs when this data improves the performance of the LLM.

Metadata sorting

This figure highlights the Metadata sorting method is made, where you can filter the missing documents using the Metadata sorting. Photo by Google Gemini.

My first example is that you can make a search with metadata filter. To give details such as:

  • File type (PDF, XLSx, Docx, …)
  • file size
  • File name

It can help your app where you download the relevant information. This, for example, can be details taken for your llM context, such as when they make Rag. You can use additional metadata to filter wrong files.

The user may have asked a question concerning documents only. Using the rag to download chunks from files from without the best Scriptures, so, the bad use of the LLM window. You should avoid the available chunks to find Excel documents, and use chunks from Excel documents to better respond to the user's question. You can learn more about handling the llM situations in my article on the active construction agents.

Another example is just asking for your AI awi questions about the latest historical recurrence after pre-trained receipt of the LLM training. The llMS usually has data training data for pre-train training data, because data requires careful design, and keeps it early on the basis.

This produces a problem when users ask questions about the latest history, for example, about the latest events in the news. In this case, AI agent Answers a question requires access to the Internet search (basically perform the information with the Internet). This is an example of the issue of information with On-Seek.

Store

In this article, I discussed that I mostly improve your llm by providing it with extra data. You can find this data on your existing metadata (File Names, File Size, Location data), or you can find data by using data issuer (Document, words mentioned in the document, etc.). This information often focuses on the power of the llM to successfully respond to user questions, and in many cases, the lack of this data indeed guarantees the failure of the llM to answer the question correctly.

👉 I have found in the community:

🧑💻 Contact your

🔗 LickDin

🐦 x / Twitter

✍️ Medium

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button