Machine Learning

How can you cultivate a vocalist in boiled languages

And Siri is a helper to do the bad things that work for most Internet-related people today. In the large part, English is an outstanding language used by these words helpers. However, for a voice helper to take the right benefit, he or she should be able to understand the user as naturally speaking. In many parts of the world, especially in a variety of lands as India, people are commonly common and change between many languages ​​in one conversation. A wise assistant should be able to handle this.

Google assistant gives power to add a second language; But its operation is limited to certain devices and gives this only for a set of large languages. For example, the Google's nesting nest has not supported the two tamil languages, a language spoken by over 80 million people. Alexa supports a two-language method as long as it is supported by the pairs of their internal language; And this supports the limited set of large languages. Sir has no power to speak in two languages ​​and allow one language at a time.

In this article I will discuss how to allow My voice assistant Having two languages ​​in English and Tamil as languages. Using this method, the vocalist will be able to automatically recognize the language of analytical analysis. By using “Confidential School” -Based Algorithm, the system will determine whether English or Tamil is spoken and responding to the corresponding language.

How to Go to Mirgility

To make an assistant English and Tamil, there are few potential solutions. The first way can train the custom machine learning model from the beginning, especially on Tamil language data, and integrate that model in Raspberry P. While this can provide a higher customary level, it is a process that is used for a play and benefit. Training model requires a major dataset and high encounter capacity. In addition, using the main custom model may reduce speed raspberry Pi, which results in poor user experience.

Fasttext method

A practical solution is to use the existing, previously trained professional model is made well with a particular work. Languages, Good option for FastText.

FastText is an open source library from Facebook AI a survey designed for the separation of the appropriate text and word representation. Comes with previously trained models can quickly identify the tape of a piece of text from a large language. Because it is not heavier and very prepared, it is a very good run to run on the pressed device like Raspberry PO without causing important apps. Therefore, the system would use fastText Different the language spoken by the user.

To use FastText, you download a compatible model (lid.176.bin) and save it in your project folder. Specify this as model_path and upload the model.

import fastText
import speech_recognition as sr
import fasttext

# --- Configuration ---
MODEL_PATH = "./lid.176.bin" # This is the model file you downloaded and unzipped

# --- Main Application Logic ---
print("Loading fastText language identification model...")
try:
    # Load the pre-trained model
    model = fasttext.load_model(MODEL_PATH)
except Exception as e:
    print(f"FATAL ERROR: Could not load the fastText model. Error: {e}")
    exit()

The next step can be transmitting voice commands, such as recording, model and gets back to predict. This can be obtained by a dedicated work.

def identify_language(text, model):
    # The model.predict() function returns a tuple of labels and probabilities
    predictions = model.predict(text, k=1)
    language_code = predictions[0][0] # e.g., '__label__en'
    return language_code

try:
    with microphone as source:
        recognizer.adjust_for_ambient_noise(source, duration=1)
        print("nPlease speak now...")
        audio = recognizer.listen(source, phrase_time_limit=8)

    print("Transcribing audio...")
    # Get a rough transcription without specifying a language
    transcription = recognizer.recognize_google(audio)
    print(f"Heard: "{transcription}"")

    # Identify the language from the transcribed text
    language = identify_language(transcription, model)

    if language == '__label__en':
        print("n---> Result: The detected language is English. <---")
    elif language == '__label__ta':
        print("n---> Result: The detected language is Tamil. <---")
    else:
        print(f"n---> Result: Detected a different language: {language}")

except sr.UnknownValueError:
    print("Could not understand the audio.")
except sr.RequestError as e:
    print(f"Speech recognition service error; {e}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

Code Block above follows the easiest way. Uses Qualizer.rececile_Google (Audio) The task of writing a voice command and transmit this text into the FastText model to receive language prediction. If the prediction is not “__label__n” English has been found and if the prediction is “__label_ta and Tamil is found.

This method leads to predicting well though. The problem is that State_ Recognition The Library automatically will go to English. So when I talk something in Tamil, we find the closest words of the Word (and not accept) in English and pass us on to FastText.

Example when I say “En peyar na” (what is my name in Tamil), State_ Recognition He understood him as “Mbie and” And that is why FastText Purchase made the language to predict the English. To overcome this, I might be hard State_ Recognition Only the work of a Tamil only. But this will conquer the idea that he is really “wise” and “two languages”. An assistant must be able to identify the language based on the question; not based on the hardest of which is joined.

Photo by Sioyoi Photography on Unscwas

How to 'Confidence'

What we need is a straightforward and data driven. The solution is available within the feature of the speaking talk library. This page Qualizer.rececile_google () The work is an API of Google talk and can write the sound at a large number of languages, including English and Tamil. The Key factor of this API is that all text provides, may return A score of self-esteem – Number of numbers between 0 and 1, indicating that it is sure that its registration is ok.

This feature allows the best and powerful way to identify languages. Let's look at the code.

def recognize_with_confidence(recognizer, audio_data):
    
    tamil_text = None
    tamil_confidence = 0.0
    english_text = None
    english_confidence = 0.0

    # 1. Attempt to recognize as Tamil and get confidence
    try:
        print("Attempting to transcribe as Tamil...")
        # show_all=True returns a dictionary with transcription alternatives
        response_tamil = recognizer.recognize_google(audio_data, language='ta-IN', show_all=True)
        # We only look at the top alternative
        if response_tamil and 'alternative' in response_tamil:
            top_alternative = response_tamil['alternative'][0]
            tamil_text = top_alternative['transcript']
            if 'confidence' in top_alternative:
                tamil_confidence = top_alternative['confidence']
            else:
                tamil_confidence = 0.8 # Assign a default high confidence if not provided
    except sr.UnknownValueError:
        print("Could not understand audio as Tamil.")
    except sr.RequestError as e:
        print(f"Tamil recognition service error; {e}")

    # 2. Attempt to recognize as English and get confidence
    try:
        print("Attempting to transcribe as English...")
        response_english = recognizer.recognize_google(audio_data, language='en-US', show_all=True)
        if response_english and 'alternative' in response_english:
            top_alternative = response_english['alternative'][0]
            english_text = top_alternative['transcript']
            if 'confidence' in top_alternative:
                english_confidence = top_alternative['confidence']
            else:
                english_confidence = 0.8 # Assign a default high confidence
    except sr.UnknownValueError:
        print("Could not understand audio as English.")
    except sr.RequestError as e:
        print(f"English recognition service error; {e}")

    # 3. Compare confidence scores and return the winner
    print(f"nConfidence Scores -> Tamil: {tamil_confidence:.2f}, English: {english_confidence:.2f}")
    if tamil_confidence > english_confidence:
        return tamil_text, "Tamil"
    elif english_confidence > tamil_confidence:
        return english_text, "English"
    else:
        # If scores are equal (or both zero), return neither
        return None, None

Logic in this block block is simple. We pass the noise in CALE_GOGILE () Work and find out all the alternative lists and score. First we try the language as tamil and we have a compatible points for self-confidence. Then we try the same sound as English and we find a conviction score from API. When we both have, we liken relief schools and choose one of the high school as the language is available.

Below is an activity of work when I talk about English and talk in Tamil.

Screenshot from extracting studio (tamil). A picture held by the author.
A screenshot from the visual output (English). A picture held by the author.

The above effects indicate how the code can understand that a language is talking about forceful, based on a confidence number.

Putting everything together – two languages

The last step can be linked this option to the Raspberry PI code based on a voice assistant. The perfect code can be found in my Gitub. Once the next step is to test the performance of a voice assistance by speaking in English and Tamil and you can see how it responds in each language. The decisions below indicate the operation of the two-language assistance when asked a question in English and Tamil.

https: /www.youtube.com/watch? v = -Fuo7x3k3kh

https: /www.youtube.com/watch? v = A7enyxj9Ofo

Store

In this article, we have seen how we can improve the assistant assistant for the two languages. By using the “Confidential Score”, the system can be done to determine whether the command is talking about English or Tamil, allowing you to understand and respond in the selected user language for this optional question. This creates natural evolution and silence experiences.

The main benefit of this method is its reliability and disability. While the project is only concealed in two languages, the same conviction can easily be expanded by three, four, or more by adding API telephone in each language and comparing all the results. Technical checks are active as a powerful basis for developing and accurate tools AI.

A reference:

[1] A. Joulin, e. grave of cemetery, up. Bojanowski, T. Molorov, The tactic bag for the correct text

[2] A. Joulin, e. grave of cemetery, up. Bojanowski, M. Douze, H. Jégou, T. Modolov, Fasttext.zip: Pressing models to classify text

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button