Machine Learning

Is the Langicact CRRACT QGON MESSY Clinical Notes Become a formal data?


The LangeXCRCRCRCRCRCRCRet from the Google engineer makes it easy to convert unclean text, random into a clean, systematic data by installing a clean llMS. Users can provide a few examples of shooting and custom schema and get the results based on that. It works both about the local llms (with Ollama).

A large amount of information on health care is not organized, making it a good place where such a tool can benefit. Notes in clinics are tall and full of abbreviations and inconsistencies. Important information such as drugs, doses, and drug reply to the ADRS is buried in the text. Therefore, with this article, I wanted to see if the langeexcract could handle the drug reaction to clinical notes. Most importantly, do it work? Let's look at the matter. Note that while Langicact is an open source project from Google engineer, it is not a legal basis.

Just a quick note: I only show that Lanexcract Works. I am not a doctor, and this is not medical advice.

▶ ️ Here the information KAGGLE booklet following as well.

Why is ADR Extraction Matters issued

A A bad drug reaction (ADR) is a dangerous, unintentional effect caused by taking medication. This can range from young consequences such as nausea or dizziness in difficult results that may require medical care.

The patient takes the head medicine but we have a stomach pain; General Response to Drug (ADR) | The writer made by the writer using chatgpt

Finding fast important for patients and pharmacovilalance. The challenge is that in clinical notes, ADRs buried next to the past conditions, the lab effects, and context. As a result, getting them is a fraud. A llms is used to get Adrs is a continuous study area. Some latest jobs have shown that the llm are ready to grow red but unreliable flags. Therefore, ADR issues are a good langexract screening, as the goal here is to see that the whistle can see the opposite response between clinics such as medicines, doses, sharpness, etc.

How LangeXCractractracttract

Before we reach using, let us separate the travel work of the Langicact. It is a simple three-step process:

  1. Describe your releasing work By writing clearly clearly specifying what you want to get out.
  2. Provide a few high quality examples To direct the model towards the format and the information you expect.
  3. Send your installation text, select a model, and let the LangExcract process. Users can Then review results, I came to a mind, or passed directly to their lower pipe.

The official GitHub of the instrument has detailed examinations that visit many backgrounds, from Shakespeare's RomeEare & Juliet business in identifying the medicines with the Medicines and Program Radiology reports. Check yourself.

Insertion

First we need to install LangExtract the library. It is always a good idea to do this inside the visual area to keep your project leaning.

pip install langextract

Diagnosing a bad drug reaction to clinics with LangeXCract & Gemini

Let us now reach our use case. In this way travel, I will use Google's Gemini 2.5 Flash the model. You can also use Gemini Pro by complex consultation activities. You will need to set your API key:

export LANGEXTRACT_API_KEY="your-api-key-here"

▶ ️ Here the information KAGGLE booklet following as well.

Step 1: Describe the job of issuing

Let us quickly find medication, doses, opposition reactions, and actions taken. We can also request sharp when it means.

prompt = textwrap.dedent("""
Extract medication, dosage, adverse reaction, and action taken from the text.
For each adverse reaction, include its severity as an attribute if mentioned.
Use exact text spans from the original text. Do not paraphrase.
Return entities in the order they appear.""")
Note is brightly ibuprofen (400 mg), the opposing reaction (a small aromatic reaction), and action taken (stops the medication). This is what an ADR appears to be valid. | Photo by the writer

Next, let's give an example of disciplining the model towards the correct format:

# 1) Define the prompt
prompt = textwrap.dedent("""
Extract condition, medication, dosage, adverse reaction, and action taken from the text.
For each adverse reaction, include its severity as an attribute if mentioned.
Use exact text spans from the original text. Do not paraphrase.
Return entities in the order they appear.""")

# 2) Example 
examples = [
    lx.data.ExampleData(
        text=(
            "After taking ibuprofen 400 mg for a headache, "
            "the patient developed mild stomach pain. "
            "They stopped taking the medicine."
        ),
        extractions=[
            
            lx.data.Extraction(
                extraction_class="condition",
                extraction_text="headache"
            ),
        
            lx.data.Extraction(
                extraction_class="medication",
                extraction_text="ibuprofen"
            ),
            lx.data.Extraction(
                extraction_class="dosage",
                extraction_text="400 mg"
            ),
            lx.data.Extraction(
                extraction_class="adverse_reaction",
                extraction_text="mild stomach pain",
                attributes={"severity": "mild"}
            ),
            lx.data.Extraction(
                extraction_class="action_taken",
                extraction_text="They stopped taking the medicine"
            )
        ]
    )
]

Step 2: Provide the input and run the release

Input, I use a real phrase of clinic from Ad CORPUS V2 dataset in the face of face.

input_text = (
    "A 27-year-old man who had a history of bronchial asthma, "
    "eosinophilic enteritis, and eosinophilic pneumonia presented with "
    "fever, skin eruptions, cervical lymphadenopathy, hepatosplenomegaly, "
    "atypical lymphocytosis, and eosinophilia two weeks after receiving "
    "trimethoprim (TMP)-sulfamethoxazole (SMX) treatment."
)

Next, let's use LangeXCract with Gemini-2.5-Flash Model.

result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    api_key=LANGEXTRACT_API_KEY 
)

Step 3: See the results

You can show businesses issued in positions

print(f"Input: {input_text}n")
print("Extracted entities:")
for entity in result.extractions:
    position_info = ""
    if entity.char_interval:
        start, end = entity.char_interval.start_pos, entity.char_interval.end_pos
        position_info = f" (pos: {start}-{end})"
    print(f"• {entity.extraction_class.capitalize()}: {entity.extraction_text}{position_info}")

LangeXCRCTRCTRICTRECECRECECRECTRENTRENTIFTIES A bad drug reaction Without confusing about prehuman conditions, a major challenge in this type of work.

If you want to see you in a mind, it will do this .jsonl file. Can upload that .jsonl File by calling a view function, and it will create a HTML file.

lx.io.save_annotated_documents(
    [result],
    output_name="adr_extraction.jsonl",
    output_dir="."
)

html_content = lx.visualize("adr_extraction.jsonl")

# Display the HTML content directly
display((html_content))

Working with long notes in the clinic

True notes at the clinic are very common for most of the time shown above. For example, here is a real note from the Ad-Corpus-V2 Dataset average extracted under the License. You can access the face or nutrient.

Quoted by the clinic note from the Ad-Corpus-V2 Dataset average extracted under the License | Photo by the writer

To process long texts with a langeexcract, keeps the same job movement but adding three parameters:

released_ It works more than text to hold more information and improve remembering.

max_workers Controls the corresponding relevant processing of the texts may not be managed as soon as possible.

Max_Char_buffer Divides text into small chunks, which helps model is always accurate even if the inputs are very long.

result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    extraction_passes=3,    
    max_workers=20,         
    max_char_buffer=1000   
)

Here's the outgoing. To get an improvement, I only show part of the output here.


If you want, and you can pass the Document URL directly to text_or_documents parameter.


Using LangeXcract with local models with Ollama

LangExtract is not limited to the Apis related. You can also hold local models through Ollama. This is especially effective when working with a critical clinical that cannot leave your secure environment. You can set Ollama in your area, drag your favorite model, and point to the langeXCRACT in it. Complete instructions are available in official documents.

Store

If he built a plan to restore the information or any application involving Metadata's release, the LangeXCTRACT can save an important number of good efforts. In my ADR test, the Langextract is well done, pointing to the drugs, doses, and reacting. All I have seen is dependent on a few examples of the user, which means while the llms make a heavy lift, people still serve as an integral part of the loop. The results were encouraging, but from health data it has a high risk, a broad and firm test across different data still need to use product use.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button