A Developer's Guide to Structured Inference: Handling Negative Constraints, Structured JSON Outputs, and Samples Made from Different Perspectives

0 0 8 minutes read

A Developer's Guide to Structured Inference: Handling Negative Constraints, Structured JSON Outputs, and Samples Made from Different Perspectives

Most developers treat validation as an afterthought—write something logical, look at the output, and iterate if needed. That approach works until credibility becomes critical. As LLMs go into manufacturing systems, the difference between that knowledge generally active and active one consistently it becomes an engineering concern. In response, the research community has formalized appreciation into a set of well-defined strategies, each designed to address specific modes of failure—whether structural, conceptual, or stylistic. These methods work entirely at the agile layer, requiring no fine-tuning, model changes, or infrastructure upgrades.

This article focuses on five such strategies: to inform about a certain role, wrong notification, JSON to inform, Attentional Thinking Questions (ARQ)again verbal sample. Rather than covering the usual basics like zero-shot or basic chain thinking, the emphasis here is on what changes when using these methods. Each is shown in a side-by-side comparison on the same job, highlighting the impact on output quality and explaining the underlying mechanism.

Here, we set up a small interface to interact with the OpenAI API. We securely load the API key at runtime using getpass, launch the client, and define a lightweight dialog wrapper to send system and user commands to the model (gpt-4o-mini). This keeps our test loop clean and reusable while focusing only on the fastest variants.

The helper functions (paragraph and separator) are derivative of the formatting, making it easy to compare the base against each advanced data. If you don't have an API key yet, you can create one from the official dashboard here:

import json
from openai import OpenAI
import os
from getpass import getpass

os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')

client = OpenAI()
MODEL = "gpt-4o-mini"
 
 
def chat(system: str, user: str, **kwargs) -> str:
    """Minimal wrapper around the chat completions endpoint."""
    response = client.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system},
            {"role": "user",   "content": user},
        ],
        **kwargs,
    )
    return response.choices[0].message.content
 
 
def section(title: str) -> None:
    print()
    print("=" * 60)
    print(f"  {title}")
    print("=" * 60)
 
 
def divider(label: str) -> None:
    print(f"n── {label} {'─' * (54 - len(label))}")

Language models are trained in a wide variety of domains—security, marketing, legal, engineering, and more. If you don't specify a role, the model excludes all of them, resulting in mostly correct but generic answers. Role-specific information corrects this by assigning a person with system knowledge (eg, “You are a senior application security researcher”). This acts as a filter, forcing the model to respond using the language, values, and thinking style of that domain.

In this example, both answers point to XSS vulnerabilities and recommend HttpOnly cookies — the basic facts are the same. The difference is how the model poses the problem. Base handles LocalStore as an exchange configuration option. A role-specific response takes it as an attack surface: it reasons what an attacker might do if XSS exists, not just that XSS is theoretically possible. That change in framing — from “here are the risks” to “here's what the attacker does with those risks” — is the result of framing the situation in action. No new information was provided. The information just changed which part of the model's information received the weight.

section("TECHNIQUE 1 -- Role-Specific Prompting")
 
QUESTION = "Our web app stores session tokens in localStorage. Is this a problem?"
 
baseline_1 = chat(
    system="You are a helpful assistant.",
    user=QUESTION,
)
 
role_specific = chat(
    system=(
        "You are a senior application security researcher specializing in "
        "web authentication vulnerabilities. You think in terms of attack "
        "surface, threat models, and OWASP guidelines."
    ),
    user=QUESTION,
)
 
divider("Baseline")
print(baseline_1)
 
divider("Role-specific (security researcher)")
print(role_specific)

Negative feedback focuses on telling the model what not to do. By default, LLMs follow the patterns learned during training and RLHF—add friendly openings, analogies, hedging (“depend”), and closing summaries. While this makes the answers feel useful, it often adds unnecessary noise to technical situations. Misinformation works by removing this default. Instead of defining the desired output, you also limit unwanted behavior, which reduces the output space of the model and leads to more accurate answers.

In the output, the difference is immediately apparent. A basic answer extends to a long, structured description with analogies, topics, and an unnecessary conclusion. The badly inspired version delivers the same core information in a much more concise form—direct, concise, and without filler. Nothing important is lost; the information simply removes the model's tendency to over-explain and browse the answer.

section("TECHNIQUE 2 -- Negative Prompting")
 
TOPIC = "Explain what a database index is and when you'd use one."
 
baseline_2 = chat(
    system="You are a helpful assistant.",
    user=TOPIC,
)
 
negative = chat(
    system=(
        "You are a senior backend engineer writing internal documentation.n"
        "Rules:n"
        "- Do NOT use marketing language or filler phrases like 'great question' or 'certainly'.n"
        "- Do NOT include caveats like 'it depends' without immediately resolving them.n"
        "- Do NOT use analogies unless they are necessary. If you use one, keep it to one sentence.n"
        "- Do NOT pad the response -- if you've made the point, stop.n"
    ),
    user=TOPIC,
)
 
divider("Baseline")
print(baseline_2)
 
divider("With negative prompting")
print(negative)

JSON data becomes important when LLM output needs to be used in code rather than read by humans. Free responses are inconsistent—the structure varies, important information is embedded in paragraphs, and minor wording changes break the logic of the analysis. By defining the JSON schema in the notification, you turn the structure into a hard constraint. This not only dictates the output format but also forces the model to organize its thinking into clearly defined fields such as goodness, badness, sentiment, and ratings.

On the way out, the difference is clear. The basic response is easy to read but disorganized—the good, the bad, and the emotional are mixed into a narrative text, making it difficult to analyze. The JSON-requested version, however, returns clean, well-defined fields that can be directly loaded and used in code without post-processing. Previously specified information is now clear and disaggregated, making the output easier to store, query, and compare at scale.

section("TECHNIQUE 3 -- JSON Prompting")
 
REVIEW = """
Honestly mixed feelings about this laptop. The display is stunning -- easily the best I've
seen at this price range -- and the keyboard is surprisingly comfortable for long sessions.
Battery life, on the other hand, barely gets me through a 6-hour workday, which is
disappointing. Fan noise under load is also pretty aggressive. For light work it's great,
but I wouldn't recommend it for anyone who needs to run heavy software.
"""
 
SCHEMA = """
{
  "overall_sentiment": "positive | negative | mixed",
  "rating": ,
  "pros": ["", ...],
  "cons": ["", ...],
  "recommended_for": "",
  "not_recommended_for": ""
}
"""
 
baseline_3 = chat(
    system="You are a helpful assistant.",
    user=f"Summarize this product review:nn{REVIEW}",
)
 
json_output = chat(
    system=(
        "You are a product review parser. Extract structured information from reviews.n"
        "You MUST return only a valid JSON object. No preamble, no explanation, no markdown fences.n"
        f"The JSON must match this schema exactly:n{SCHEMA}"
    ),
    user=f"Parse this review:nn{REVIEW}",
)
 
divider("Baseline (free-form)")
print(baseline_3)
 
divider("JSON prompting (raw output)")
print(json_output)
 
divider("Parsed & usable in code")
parsed = json.loads(json_output)
print(f"Sentiment         : {parsed['overall_sentiment']}")
print(f"Rating            : {parsed['rating']}/5")
print(f"Pros              : {', '.join(parsed['pros'])}")
print(f"Cons              : {', '.join(parsed['cons'])}")
print(f"Recommended for   : {parsed['recommended_for']}")
print(f"Avoid if          : {parsed['not_recommended_for']}")

Argumentative Reasoning Questions (ARQ) build on chain thinking but eliminate its greatest weakness—disordered thinking. In a typical CoT, the model decides what to focus on, which can lead to gaps or irrelevant details. ARQ replaces this with a fixed set of domain-specific questions that the model must answer in sequence. This ensures that all key elements are integrated, shifting control from the model to the rapid designer. Instead of just directing how the model thinks, ARQ defines what it should think about.

In output, the difference is seen as discipline and inclusion. The basic CoT response identifies important issues but drifts into inappropriate areas and misses critical analysis in areas. The ARQ version, however, addresses each of the necessary points—clearly classifying risks, managing critical situations, and assessing operational impacts. Each question acts as a checkpoint, making the answer more structured, complete, and easy to research.

section("TECHNIQUE 4 -- Attentive Reasoning Queries (ARQ)")
 
CODE_TO_REVIEW = """
def get_user(user_id):
    query = f"SELECT * FROM users WHERE id = {user_id}"
    result = db.execute(query)
    return result[0] if result else None
"""
 
ARQ_QUESTIONS = """
Before giving your final review, answer each of the following questions in order:
 
Q1 [Security]: Does this code have any injection vulnerabilities?
               If yes, describe the exact attack vector.
Q2 [Error handling]: What happens if db.execute() throws an exception?
                     Is that acceptable?
Q3 [Performance]: Does this query retrieve more data than necessary?
                  What is the cost at scale?
Q4 [Correctness]: Are there edge cases in the return logic that could
                  cause a silent bug downstream?
Q5 [Fix]: Write a corrected version of the function that addresses
          all issues found above.
"""
 
baseline_cot = chat(
    system="You are a senior software engineer. Think step by step.",
    user=f"Review this Python function:nn{CODE_TO_REVIEW}",
)
 
arq_result = chat(
    system="You are a senior software engineer conducting a security-aware code review.",
    user=f"Review this Python function:nn{CODE_TO_REVIEW}nn{ARQ_QUESTIONS}",
)
 
divider("Baseline (free CoT)")
print(baseline_cot)
 
divider("ARQ (structured reasoning checklist)")
print(arq_result)

Verbal sampling addresses the main limitation of LLMs: they tend to return a single, definitive answer even when multiple interpretations are possible. This happens because alignment training likes the results. As a result, the model hides its internal uncertainty. A verbal sample corrects this by clearly asking for multiple opinions, as well as a level of confidence and supporting evidence. Instead of forcing a single response, it presents a range of plausible outcomes—all within the context, without requiring model changes.

For output, this changes the result from a single label to a structured associated view. The basis provides a single category with no indication of uncertainty. The verbal version, however, lists many measurements, each with a meaning and a way to confirm or reject it. This makes the output more actionable, making it useful for decision making rather than just feedback. Confidence scores themselves are not exact probabilities, but they effectively indicate relative probabilities, which are often sufficient to prioritize and streamline workflows.

section("TECHNIQUE 5 -- Verbalized Sampling")
 
SUPPORT_TICKET = """
Hi, I set up my account last week but I can't log in anymore. I tried resetting
my password but the email never arrives. I also tried a different browser. Nothing works.
"""
 
baseline_5 = chat(
    system="You are a support ticket classifier. Classify the issue.",
    user=f"Ticket:n{SUPPORT_TICKET}",
)
 
verbalized = chat(
    system=(
        "You are a support ticket classifier.n"
        "For each ticket, generate 3 distinct hypotheses about the root cause. "
        "For each hypothesis:n"
        "  - State the category (Authentication, Email Delivery, Account State, Browser/Client, Other)n"
        "  - Describe the specific failure moden"
        "  - Assign a confidence score from 0.0 to 1.0n"
        "  - State what additional information would confirm or rule it outnn"
        "Order hypotheses by confidence (highest first). "
        "Then provide a recommended first action for the support agent."
    ),
    user=f"Ticket:n{SUPPORT_TICKET}",
)
 
divider("Baseline (single answer)")
print(baseline_5)
 
divider("Verbalized sampling (multiple hypotheses + confidence)")
print(verbalized)

Check it out Full Codes with Notebook here. Also, feel free to follow us Twitter and don't forget to join our 130k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.

Need to work with us on developing your GitHub Repo OR Hug Face Page OR Product Release OR Webinar etc.? Connect with us

I am a Civil Engineering Graduate (2022) from Jamia Millia Islamia, New Delhi, and I am very interested in Data Science, especially Neural Networks and its application in various fields.

Source link

nimda 3 hours ago

0 0 8 minutes read