Machine Learning

Designing a custom Word Assistance of anger

Alexa, Siri were rulers of counters found in daily use. These helpers have been frustrated in almost all households, doing jobs from household cars, recognize, the direction of the recipe and answer simple questions. When it comes to answering questions, when, when receiving the LLMS, finding a brief response and the Msoes in the voice assistant can be deceitful, if not. For example, if you ask the Google Assistant how the market responds to Jackson Powell's speech on Aug 22, it will only answer that they do not know the answer and provide a few links. That is if you have Google's disadvantage from the screen.

You usually want a quick response to current events, or you want to know if the apple tree will survive winter in Ohio, and the voice adversaries will like to provide satisfactory feedback. This made me happy to build my own word helper, to give me a simple helper, one sentence based on their web search.

Photo by Aurps.com in UnderChes

In different llm powered search engines available, I was an Avid of DideePless user for over a year now and I use only my search for all places where I still return to Google or Bing. Confusion, in addition to its live web reference, it allows you to give the latest, accurate, received, allows users to access its power for API powerful API. Using this work and combining it with simple raspberry Pi, I intended to create a vocalist who could:

  • Reply to the risened word and ready to answer my question
  • Answer my question for a simple, short sentence
  • Go back to listening just without selling my data or gives my unnecessary ads

Assistant Hardware

Photo by Axel Richter on Unseplassh

To build our vocal assistant, a few of the important parts of the important hardware needs. The Cometem of the Project is the The Jaspberry PO 5acting as a middle processor of our application. An audio attorney input, I chose simple USB Geolesheck Microphone. This type of urophone is omnidirirections, making it successful in arousing words in different parts of the room, and its plug-and-play environment simulates simplicity. By the relief of a helper, compact A powerful Mouth of USB provides noise out of sound. The main benefit of this speaking is that it uses one USB cable by its power and a noise signal, which reduces cables.

The Weblock drawing that displays the performance of a custom Word of the Word (Photo by the author)

This method of easily accessible USB makes a hardware meeting straight, allowing us to focus on our software efforts.

To find the nature ready

To ask confession using custom questions and to have a risen name of the voice of a voice, we need to produce a few API keys. To generate the DellePlity API key that you can register for the DuntePlextITity account, go to the Settings menu, and click “API Key” to copy its key to use apps. The discovery of a locked generation of API usually requires a paid program or payment method, so make sure the account is appropriate before proceeding.

Platforms provide customization of words that include Picovoice porcupine, and Snowboy Porcupine that offers simple Internet production, testing, and embeddown devices. The new user can produce a customary name of Picovoice Porcupine by signing up for a porcove's popovooy account, to exchange the “Porcupine page, and click the” train “. Make sure that you test the last word before completion, because this confirms reliable discovery and false habits. A resurrection voice I trained and will use “hey Krishna”.

Entry

The Port Nthon Text of this project is available on my GitTub Repository. At this stage, let us look at the key components of the code to understand how the helper works.
The text is organized in a few key dealings dealing with the senses of assistant and intellectuals, all who are controlled by the Middle Loop.

Configuration and Startup

The first part of the script is dedicated to the setting. It treats uploading the required API keys, model files, and start clients about services we will use.

# --- 1. Configuration ---
load_dotenv()
PICOVOICE_ACCESS_KEY = os.environ.get("PICOVOICE_ACCESS_KEY")
PERPLEXITY_API_KEY = os.environ.get("PERPLEXITY_API_KEY")
KEYWORD_PATHS = ["Krishna_raspberry-pi.ppn"] # My wake word pat
MODEL_NAME = "sonar"

This section is using dotenv Library to secure your private API keys from .env File, which is the best custom that you end up out of your source code. It also describes key variables as a way to your wake once you once your Word and some confusion model we want to ask.

Volume

For an advanced helper, it may require continuous listening with a specific resurrection name without using important program resources. This is handled by while True: loop in the main Work, using Picovovovoice Porcupine engine.

# This is the main loop that runs continuously
while True:
    # Read a small chunk of raw audio data from the microphone
    pcm = audio_stream.read(porcupine.frame_length)
    pcm = struct.unpack_from("h" * porcupine.frame_length, pcm)
    
    # Feed the audio chunk into the Porcupine engine for analysis
    keyword_index = porcupine.process(pcm)

    if keyword_index >= 0:
        # Wake word was detected, proceed to handle the command...
        print("Wake word detected!")

This loop is the heart of the “obedience” of the tribulation. It continues to study young frames, noise of sound from the microphone river. Each outline is transferred to porcupine.process() work. This works very well, internet The process that processes the sound with a particular acoustic pattern of your custom opening (“Krishna”). If the pattern is found, porcupine.process() Returns a non-negative number, and the script text forwarded to the active stage to listen to the full command.

Speech-to-Scripture – to change user questions in the text

After the resurrection is found, the assistant needs to listen and understand the user's question. This is handled by part of the talk-to-text (STT).

# --- This logic is inside the main 'if keyword_index >= 0:' block ---

print("Listening for command...")
frames = []
# Record audio from the stream for a fixed duration (~10 seconds)
for _ in range(0, int(porcupine.sample_rate / porcupine.frame_length * 10)):
    frames.append(audio_stream.read(porcupine.frame_length))

# Convert the raw audio frames into an object the library can use
audio_data = sr.AudioData(b"".join(frames), porcupine.sample_rate, 2)

try:
    # Send the audio data to Google's service for transcription
    command = recognizer.recognize_google(audio_data)
    print(f"You (command): {command}")
except sr.UnknownValueError:
    speak_text("Sorry, I didn't catch that.")

As soon as the resurrection name is detected, the code records audio from the microphones for approximately 10 seconds, capturing the spoken order of the user. Then add this green sound data and send Google talk service using speech_recognition the library. Service processes the sound and returns a written text, stored in command Variable.

Finding answers from getting busy

When a user command has been changed into the text, it is sent to the Deplexity API API to find a smart, time.

# --- This logic runs if a command was successfully transcribed ---

if command:
    # Define the instructions and context for the AI
    messages = [{"role": "system", "content": "You are an AI assistant. You are located in Twinsburg, Ohio. All answers must be relevant to Cleveland, Ohio unless asked for differently by the user.  You MUST answer all questions in a single and VERY concise sentence."}]
    messages.append({"role": "user", "content": command})
    
    # Send the request to the Perplexity API
    response = perplexity_client.chat.completions.create(
        model=MODEL_NAME, 
        messages=messages
    )
    assistant_response_text = response.choices[0].message.content.strip()
    speak_text(assistant_response_text)

This block is a “Brain” working. First to build a messages List, which includes sensitive Fast system. This acceleration gives the AI ​​of its personality and its rules, such as responding to one sentence and knows its place in Ohio. The user command and added to this list, and the entire package is sent to the delete apple. Script and then issuing text from the AI ​​response and transmit it to speak_text Work to be read aloud.

Text-to-speech – to convert the confused response of the voice

This page speak_text The work is what gives her voice helper.

def speak_text(text_to_speak, lang='en'):
    # Define a function that converts text to speech, default language is English
    
    print(f"Assistant (speaking): {text_to_speak}")
    # Print the text for reference so the user can see what is being spoken
    
    try:
        pygame.mixer.init()
        # Initialize the Pygame mixer module for audio playback
        
        tts = gTTS(text=text_to_speak, lang=lang, slow=False)
        # Create a Google Text-to-Speech (gTTS) object with the provided text and language
        # 'slow=False' makes the speech sound more natural (not slow-paced)
        
        mp3_filename = "response_audio.mp3"
        # Set the filename where the generated speech will be saved
        
        tts.save(mp3_filename)
        # Save the generated speech as an MP3 file
        
        pygame.mixer.music.load(mp3_filename)
        # Load the MP3 file into Pygame's music player for playback
        
        pygame.mixer.music.play()
        # Start playing the speech audio
        
        while pygame.mixer.music.get_busy():
            pygame.time.Clock().tick(10)
        # Keep the program running (by checking if playback is ongoing)
        # This prevents the script from ending before the speech finishes
        # The clock.tick(10) ensures it checks 10 times per second
        
        pygame.mixer.quit()
        # Quit the Pygame mixer once playback is complete to free resources
        
        os.remove(mp3_filename)
        # Delete the temporary MP3 file after playback to clean up
        
    except Exception as e:
        print(f"Error in Text-to-Speech: {e}")
        # Catch and display any errors that occur during the speech generation or playback

This work takes a piece of text, to refer to, and use GTTS (Google Text-to-talk) Library to produce temporary sound file. Playing file speakers using a Postpame library, waiting until the play completed, and delete the file. Error hosting included in hosting problems during process.

Exploring a Helper

Below it shows the performance of a custom Word for a custom. Comparing its operation with Google's assistant, I requested the same question in Google and the custom assistance.

https: /www.youtube.com/watch? V = WCKKQ-Zqi

As you can see, Google provides coordinates for feedback rather than providing a brief summary of the user's requirements. The custom assistant continues and gives a summary and is very helpful.

Store

In this article, we looked at the process of building an Assistant of a fully functional, hands-free of Raspberry P. By combining the workity power with a DellePless API through the Python, we create a simple voice device that helps to find the details immediately.

The main benefit of the llM-based approach is its direct mandate, integrated responses to complex and current questions – work where the assistants there often falls by installing search links. Instead of working as a mieling mieling connection, our tasks as the true true language, to combine the actual web results to provide one, short response. The future of the Word of the Word lies in the deepest, more intelligent, and building your own is the best way to examine it.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button