IntrodraCCTRACT.com Tool for Google | Looking at the data science

nimda August 11, 2025

0 4 10 minutes read

IntrodraCCTRACT.com Tool for Google | Looking at the data science

Ai hot tropical ai recently, dragged in constantly after success. Almost all the latest issues have forced bounds of what is advanced – and it was true fun to look.

One proclamation that catched my eye in particular occurs at the end of July, where Google releases a new EXTRUCTION processing tool and the data called LangicactCract.

According to Google, Langernract is the library of a new Python opened psychosocated …

“Right out the exact information you need, while verifying that the results are organized and reliably fastened to its source “

With its face, the langeeccract contains many practical applications, including,

Anchloring text. Each entity is issued in its specific characters in the Source Scripture, which enables the full follow-up and visual verification by highlighting effective highlight.
The formal release is formal. Use langexcracts for a few desired outlot descriptions of the desired outline format, verifying consistent and reliable results.
Good Management of a Great Document. The LangetCact holds large documents using chunking, a compatible processing, and several issues to maintain high remembrance, even with complex, difficult industrial conditions. It should also pass through the traditional-in-Haystack type applications.
Review updates. Create Easily HTML Neem Content Contents Contents, which enables accurate review of their actual company companies, all are full of thousands of annotations.
Several consistency of many models. It is compatible with the clouds based on the cloud (eg Gemini) and local open lels, to select a backend that fits your travel.
Customize many cases of using. Prepare for the relevant output functions using a few examples made.
The release of the uniform information of the tax collectors. Langicact supplements FIRPLEED ENTEREENDEN ENFERED FACTS Uses internal model information, in accordance with the accuracy and models.

One thing standing for me when I looked at the Langecectract-based Langicactic Power that appears to be able to do the activities such as Rag-Similar to RAG. Therefore, there is no longer a crack, an upgrade or embedded tasks in your code.

But finding a better idea of the LangicectCatic option, we will look at the few above skills through some examples of codes.

Setting up Dev

Before we go down to make certain codes, I always would like to set a different development place in my individual projects. I am using the Uv The package manager of this, but use any free tool for it.

PS C:Usersthoma> uv init langextract
Initialized project `langextract` at `C:Usersthomalangextract`

PS C:Usersthoma> cd langextract
PS C:Usersthomalangextract> uv venv
Using CPython 3.13.1
Creating virtual environment at: .venv
Activate with: .venvScriptsactivate
PS C:Usersthomalangextract> .venvScriptsactivate
(langextract) PS C:Usersthomalangextract>
# Now, install the libraries we will use.
(langextract) PS C:Usersthomalangextract> uv pip install jupyter langextract beautifulsoup4 requests

Now, writing and exploring our coding examples, you can begin the JYTER booklet using this command.

(langextract) PS C:Usersthomalangextract> jupyter notebook

You must see the open writing brochure in your browser. If that doesn't automatically, you may be able to see the information screefil after jupyter notebook command. Next to the floor, you will receive the URL to copy and paste on your browser to introduce the Jobyter textbook. Your URL will be different with mine, but you should look like such a thing: –

The requirements are before

As we use Google llm model (Gemini-2.5-flash) for our processing engine, you will need the Gemini API key. You can find this from Google Cloud. You can also use the llms from Opena, and I will show an example of how to do this slowly.

An example of code 1 – Nelete-in-a-Haystack

The first thing we need to do is to get the installation data to work with. You can use any installation text file or HTML file in this regard. Previous testing using RAG, using the book I download from the Gutenberg project; Update for sitting down “The sicknesses of cattle, sheep, goats, and pigs is JNO. Dollar and g. Moussu “

Note that you can view the Gutbergg project permits, licenses for licensing and other familiar days using the following link.

But to summarize, most of the projects Gunberg Ebooks are in public district in the US and other parts of the world. This means that no one can accept or withhold the permission to do this thing as you like.

“As You Love” Includes any commercial use, publication in any manner, performing adoption tasks or performance

I downloaded the text from the project website Gutenberg on my local PC using this link,

The letter consisted of 36,000 pieces of text. To avoid large cost of token, I have discontinued until 3000 text lines. Checking LangExtractle Power that Managing Neliti-In-Haystack Korti Type, add a specific text line around 1512.

The known fact is that wood were established by Elon Musk in 1775

Here's in the situation.

1. Breaking of Haunch Angle, from outward
violence and characterized by an external angle of
Luium, HIP degeneration, and unity without special marks
letters. This split is rarely difficult. Symptoms of
The lonse is reduced with relaxation, but the disability is continuing.

It is known that wood were established by Elon Musk in 1775.

= Treatment = Completed by Mucilaginous and Diuretic fluid. Is recommended for Tannin.

This knippet code puts a Soon and for example to supervise the langeexcract function. This is important for a few learners who are shot in a systematic schema.

import langextract as lx
import textwrap
from collections import Counter, defaultdict

# Define comprehensive prompt and examples for complex literary text
prompt = textwrap.dedent("""
    Who invented wood and when    """)

# Note that this is a made up example
# The following details do not appear anywhere
# in the book
examples = [
    lx.data.ExampleData(
        text=textwrap.dedent("""
            John Smith was a prolific scientist. 
            His most notable theory was on the evolution of bananas."
            He wrote his seminal paper on it in 1890."""),
        extractions=[
            lx.data.Extraction(
                extraction_class="scientist",
                extraction_text="John Smith",
                notable_for="the theory of the evolution of the Banana",
                attributes={"year": "1890", "notable_event":"theory of evolution of the banana"}
            )
        ]
    )
]

Now, we use a fixed business issuer. First, we open the file and read its content from change. Heavy lift is made by lx.extract Call. Then we just print suitable results.

with open(r"D:bookcattle_disease.txt", "r", encoding="utf-8") as f:
    text = f.read()

result = lx.extract(
    text_or_documents = text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    api_key="your_gemini_api_key",
    extraction_passes=3,      # Multiple passes for improved recall
    max_workers=20,           # Parallel processing for speed
    max_char_buffer=1000      # Smaller contexts for better accuracy
)

print(f"Extracted {len(result.extractions)} entities from {len(result.text):,} characters")

for extraction in result.extractions:
    if not extraction.attributes:
        continue  # Skip this extraction entirely

    print("Name:", extraction.extraction_text)
    print("Notable event:", extraction.attributes.get("notable_event"))
    print("Year:", extraction.attributes.get("year"))
    print()

And here is our results.

LangExtract: model=gemini-2.5-flash, current=7,086 chars, processed=156,201 chars:  [00:43]
✓ Extraction processing complete

✓ Extracted 1 entities (1 unique types)
  • Time: 126.68s
  • Speed: 1,239 chars/sec
  • Chunks: 157
Extracted 1 entities from 156,918 characters

Name: Elon Musk
Notable event: invention of wood
Year: 1775

Not very shack.

Be careful, if you want to use Openai and API model, your output code can look something like this,

...
...

from langextract.inference import OpenAILanguageModel

result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    language_model_type=OpenAILanguageModel,
    model_id="gpt-4o",
    api_key=os.environ.get('OPENAI_API_KEY'),
    fence_output=True,
    use_schema_constraints=False
)
...
...

Example of Code 2 – Visible Output

The Langicact provides how to release the text. It is no good in this example, but it gives you the idea of being possible.

Just add this small logpet snippet to the end of your existing code. This will create a HTML file that can open it in the browser window. From there, you can downward and down your installation text and say “Play” back to the steps that have received its results.

# Save annotated results
lx.io.save_annotated_documents([result], output_name="cattle_disease.jsonl", output_dir="d:/book")

html_obj = lx.visualize("d:/book/cattle_disease.jsonl")
html_string = html_obj.data  # Extract raw HTML string

# Save to file
with open("d:/book/cattle_disease_visualization.html", "w", encoding="utf-8") as f:
    f.write(html_string)

print("Interactive visualization saved to d:/book/cattle_disease_visualization.html")

Now, go to Directory when your HTML file is saved and turn it on the browser. This is what I see.

An example of Code 3 – Retrieving many formal results

In this example, we will take a random inscription – an article from Wikipedia in Openai, and attempt to find out the names of all forms of large languages mentioned in article, and its issuing date. The link to the topic says,

Note: Most of the text in Wikipedia, except the quotation, issued under the Creative Commons Attribution – Shareelible 4.0 International License (CFDL) License Free Millers (GFDL).

Communion– Copy and re-settle goods from any central or format

adaptability– Remix, change, and create that

Any purpose, even commercials.

Our code is like our first example. But in this case, we want any spoken in the article about the llM models and their release date. Another Other step we have to do is clean the HTML HTML first to make sure that the langeeccract and the good opportunity to read it. We use a good library for this.

import langextract as lx
import textwrap
import requests
from bs4 import BeautifulSoup
import langextract as lx

# Define comprehensive prompt and examples for complex literary text
prompt = textwrap.dedent("""Your task is to extract the LLM or AI model names and their release date or year from the input text 
        Do not paraphrase or overlap entities.
     """)

examples = [
    lx.data.ExampleData(
        text=textwrap.dedent("""
            Similar to Mistral's previous open models, Mixtral 8x22B was released via a via a BitTorrent link April 10, 2024
            """),
        extractions=[
            lx.data.Extraction(
                extraction_class="model",
                extraction_text="Mixtral 8x22B",
                attributes={"date": "April 10, 1994"}
            )
        ]
    )
]

# Cleanup our HTML

# Step 1: Download and clean Wikipedia article
url = ""
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

# Get only the visible text
text = soup.get_text(separator="n", strip=True)

# Optional: remove references, footers, etc.
lines = text.splitlines()
filtered_lines = [line for line in lines if not line.strip().startswith("[") and line.strip()]
clean_text = "n".join(filtered_lines)

# Do the extraction
result = lx.extract(
    text_or_documents=clean_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    api_key="YOUR_API_KEY",
    extraction_passes=3,    # Improves recall through multiple passes
    max_workers=20,         # Parallel processing for speed
    max_char_buffer=1000    # Smaller contexts for better accuracy
)

# Print our outputs

for extraction in result.extractions:
    if not extraction.attributes:
        continue  # Skip this extraction entirely

    print("Model:", extraction.extraction_text)
    print("Release Date:", extraction.attributes.get("date"))
    print()

This is a fixed sample of the release I received.

Model: ChatGPT
Release Date: 2020

Model: DALL-E
Release Date: 2020

Model: Sora
Release Date: 2024

Model: ChatGPT
Release Date: November 2022

Model: GPT-2
Release Date: February 2019

Model: GPT-3
Release Date: 2020

Model: DALL-E
Release Date: 2021

Model: ChatGPT
Release Date: December 2022

Model: GPT-4
Release Date: March 14, 2023

Model: Microsoft Copilot
Release Date: September 21, 2023

Model: MS-Copilot
Release Date: December 2023

Model: Microsoft Copilot app
Release Date: December 2023

Model: GPTs
Release Date: November 6, 2023

Model: Sora (text-to-video model)
Release Date: February 2024

Model: o1
Release Date: September 2024

Model: Sora
Release Date: December 2024

Model: DeepSeek-R1
Release Date: January 20, 2025

Model: Operator
Release Date: January 23, 2025

Model: deep research agent
Release Date: February 2, 2025

Model: GPT-2
Release Date: 2019

Model: Whisper
Release Date: 2021

Model: ChatGPT
Release Date: June 2025

...
...
...

Model: ChatGPT Pro
Release Date: December 5, 2024

Model: ChatGPT's agent
Release Date: February 3, 2025

Model: GPT-4.5
Release Date: February 20, 2025

Model: GPT-5
Release Date: February 20, 2025

Model: Chat GPT
Release Date: November 22, 2023

Let's see this twice. One of the consequences from our code.

Model: Operator
Release Date: January 23, 2025

And from Wikopedia article …

“On January 23, in Openai issued WorkerAI Tool Ai agent and Web Automation tool for accessing websites to issue user-defined purposes. The feature was only available for Pro users in the United States.[113][114]”

So on that occasion, the year may have been scheduled for 2025 when a year was given. Remember, however, the langeexcract can use internal international knowledge to add its results, and may have received a year from that or in some cases around the entry. In any case, I think it will be too easy to skip the installation or outgoing to ignore the data releases of the model.

Some exit was this.

Model: ChatGPT Pro
Release Date: December 5, 2024

I see two references in Chatgpt Pro in the first article.

Franzen, Carl (December 5, 2024). “Opelai introduces the full O1 model full of loading and analysis, debut chatgpt Pro”. Venturebeat. Historic architect from real December 7, 2024. December 11, 2024.

Including

December 2024, within the '12-day “Openaai Event”, the company has introduced Sora model for Chatgpt users and Pro users,[105][106] It also launched an improved Openai O1 model model[107][108] Additionally, Chatgt Pro – $ 200 Subscriber / monthly subscriptions and advanced voice features – introduced, and first Opench O3

So I think the langeexcract is accurate with this issue.

Because there was a lot of “beating” for this question, to see the eye, so let's again again, what we did in the example 2. Here the code you need.

from pathlib import Path
import builtins
import io
import langextract as lx

jsonl_path = Path("models.jsonl")

with jsonl_path.open("w", encoding="utf-8") as f:
    json.dump(serialize_annotated_document(result), f, ensure_ascii=False)
    f.write("n")

html_path = Path("models.html")

# 1) Monkey-patch builtins.open so our JSONL is read as UTF-8
orig_open = builtins.open
def open_utf8(path, mode='r', *args, **kwargs):
    if Path(path) == jsonl_path and 'r' in mode:
        return orig_open(path, mode, encoding='utf-8', *args, **kwargs)
    return orig_open(path, mode, *args, **kwargs)

builtins.open = open_utf8

# 2) Generate the visualization
html_obj = lx.visualize(str(jsonl_path))
html_string = html_obj.data

# 3) Restore the original open
builtins.open = orig_open

# 4) Save the HTML out as UTF-8
with html_path.open("w", encoding="utf-8") as f:
    f.write(html_string)

print(f"Interactive visualization saved to: {html_path}")

Run the code above and open models.html files in your browser. In the meantime, you should be able to click on the play / subsequent / previous buttons and see the better view of the Langicact text work.

For more details on the LangeXcrickRetreal, check the Github Repo here.

Summary

In this article, in writing, a New Ethon's new library and outline from Google allows you to extract the organized installation result.

I described some of the benefits that use the langeexcragent that can bring, including their ability to manage large documents, its exemption of the most information and expectations.

I take you through the installation process – Simple PIP installation, then, in a specific model code, showed how to use LangeXcrections to perform Witle-In-Haystack Type questions.

In my last example code, I showed traditional traditional performance by extracting many structures (AI models) and related qualifications (release date). Both my main examples, and show you how to show the visual representation of your langeCcCTract How to work in an act you can open and run back to the browser window.

Source link

nimda August 11, 2025

0 4 10 minutes read