Machine Learning

Fourier applications transform in the scrutiny of the documents

Poems are often seen as a clean art form, from a solid building of haiku on fluid, uninvited type of free poems. By analyzing these jobs, however, how can mathematics and data analysis fill the definitions of these books flow outside? Of course, the Rhetoric can be evaluated, and the directions can be obtained, and the choice of words may be restored, but is just the basement of the author can be found using analysis techniques in the books? Like the first test of the analysis of the corrupt assistance, we will try to use a pideeer transition program to search occasionally in the poem. To view our code, we will use two trial courses: “Do not go andthen to the good night” by Dylan Thomas “followed by Lewis Carroll” Jabberwocky. ”

1. Data receiving

a. To separate the line and count the words

Before making any statistics, all the necessary information should be collected. For our purposes, we will search for a set of letters of the number, words, syllables, and seeing each of the line. First of all, we need to meet the poem itself (included as an obvious text file) into a substrings of each line. This is easily done onpython with .split() path; Passing Delimiter “n“In this way it will divide the file by the line, returns each line line. (A perfect way poem.split(“n”)). Counting the number of words is easy as separating the lines, and it follows right there: First, it is included in all lines, enter .split() The way and – at this time without delay- to fence to distinguish white depets, change each line rope to the word cables. Then, calculating the number of words in any line given for just a built-in len() work each one on each line; Since each line is broken in the list of words, len() We will return the number of items to the line list, which is the name count.

b. Literature

Counting the number of characters in each row, all we need to do is taking the number of letters of each name, so in the powers of each word, driving len()Finding the calculation of characters in a given name. After entering every word in line, the letters are summarized with the number of characters in line; Code to do this sum(len(word) for word in words).

c. The lengthy length

Counting the length you see each of the line is simple; Taking the Monospace font, the length of the line is just the number of letters (including spaces!) Exists in line. Therefore, the point of view is just just len(line). However, many Fonts are not monospace, especially written fonts such as Caslon, Garamond, and Georgia – This is a specific issue because the writer is, we cannot calculate the exact length. While this consideration leaves an error mistake, considered by the alternative powers is important, so the thought of monompace will be required to be used.

d. Syllable figure

Finding a harmonious calculation without a manual reading of each line is the most difficult part of the data collection. Diagnosing syllable, we will use the vowels of vowels. Note that in my system I explained a job, count_syllables(word)calculate the silvible in each word. Pleasing the name, we put you in all the lowest use word = word.lower() and subtract any signs of writing that may be included in the name of using word = re.sub(r'[^a-z]', '', word). Next, find all vowels or groups each vowel should be a syllable, as one syllable is described apparently as a calling unit containing continuous related unit. Finding each vanka group, we can use the regex for all vowels, including: syllables = re.findall(r'[aeiouy]+', word). After describing syllables, it will be a list of all vowels in a given name. Finally, there should be at least one synchronization of each name, so whether you include vonseloss name (CWM, for example, you will return one syringes. Work is:

def count_syllables(word):
    """Estimate syllable count in a word using a simple vowel-grouping method."""
    word = word.lower()
    word = re.sub(r'[^a-z]', '', word)  # Remove punctuation
    syllables = re.findall(r'[aeiouy]+', word)  # Find vowel clusters
    return max(1, len(syllables))  # At least one syllable per word

The work will return the calculation of any entry, so finding a corresponding calculation of the full line, return to the previous loop (used for data collection in 1.A num_syllables = sum(count_syllables(word) for word in words).

e. A summary of data collection

The data collection While complicated time is not effective for minor amounts of installation data used, the work is working during a line, which is useful in the case that is used to analyze the large amount of information. Data collection activity finally is:

def analyze_poem(poem):
    """Analyzes the poem line by line."""
    data = []
    lines = poem.split("n")

    for line in lines:
        words = line.split()
        num_words = len(words)
        num_letters = sum(len(word) for word in words)
        visual_length = len(line)  # Approximate visual length (monospace)
        num_syllables = sum(count_syllables(word) for word in words)
        word.append(num_words)
        letters.append(num_letters)
        length.append(visual_length)
        sylls.append(num_syllables)

        data.append({
            "line": line,
            "words": num_words,
            "letters": num_letters,
            "visual_length": visual_length,
            "syllables": num_syllables
        })

    return data

2. Discrete Fineer Transform

INTRODUCTION: This section takes a comprehensive understanding of 4 variables; With a brief and purchased presentation, try this article with a sho said.

a. Some specific algorithm

Dealing with some specific DFT algorithm specified, we need to contact the NUMPSPY FOUG FASFM process. Suppose Ni Is the number of sharp prices changed: if Ni Has the power of 2, Inughs use a Radix-2 radix algorithm-tukey, distinguishing the replacement of in areas and unusual indicators. If Ni It is not the energy of 2, Inundix use a mixed-radix method, where the input is included in the highest features, and FFTS is integrated using the relevant basic charges.

b. Using DFT

To use DFT to the data collected previously, I have created a job fourier_analysiswhich only takes the Master data (dictionary list and all data points in each row) as a dispute. Fortunately, because the incense is very direct to Mathematics, the code is simple. First, get Nito be a data point number to be converted; This is just just N = len(data). Next, enter the Nunpy FFT algorithm on data using method np.fft.fft(data)Returns a list of complex Coefficients representing an amplitude and a four series. Finally, np.abs(fft_result) How to extract the size of each coefficient, representing its power in the original data. The work returns a strong scarf with a stream of frequency.

def fourier_analysis(data):
    """Performs Fourier Transform and returns frequency data."""
    N = len(data)
    fft_result = np.fft.fft(data)  # Compute Fourier Transform
    frequencies = np.fft.fftfreq(N)  # Get frequency bins
    magnitudes = np.abs(fft_result)  # Get magnitude of FFT coefficients

    return list(zip(frequencies, magnitudes))  # Return (freq, magnitude) pairs

The perfect code can be found here, in GitTub.

3. Subject lessons

a. Introduction

We have done it to all the Kings and Twister Algorithms, finally time installing the program. Due to the time, the analysis of the books made here will be smaller, to put pressure on data analysis. Note that while the Fourithm of Fourit Transform returns a closing viewer, we want a viewing season, so the relationships (T = fraci {1}) will be used to find time for time. For purposes of comparing different spectrums' prices), we will be using the metric-to-noise metric (SNR). The sound of the ordinary signal is counted as an arithmetic. Finding SNR, you simply take (Frac {x_ {peak}} {P_ {Our}); Supreme SNR means a burning SNR SNR means higher SNR SNR to the top of the shape related signal. SNR is a powerful decision to find poetry because it is almost a signal (ie, Rythmic organized patterns represent the background sound (random variation). Unlike variations, which measures complete dispersion, or autocorrelation, directly emphasizes certain lags, prominently the case that the pattern is related to metrical identifiers.

b. “Don't go right at that night” – dylan thomas

This function has a clear and visual structure, so it's a good test data. Unfortunately, the consensus data here will not find anything interesting here (Thomas' poem written with an Iambic Pentame); Data for the name of the name, on the other hand, has a high value of SNR in any four metrics, 6.086.

Figure 1. Note that this number and all of the following are produced using Google sheets.

The spectrum above shows the dominant signal during the 4th time, and little noise sometimes grades. In addition, to look at its highest value of SNR compared to counting, synchronism, and views, and the length of viewpers fun: The poem follows the ABA test plan; This means that each line calculator repeatedly multiplies in Tandem through the agreement scheme. SNRS Some of the two suitable spectrums are not after the word-count SNR, color – calculating 5.724 and visible lengths in 5.905. Those two spectrums also have their peaks during 4 lines, indicating that they are associated with the poem system.

c. “JabberWocky” – Lewis Carroll

Carrol writing is also great at work in the construction, but there is no wrong; In the term spectrum there is a maximum amount in ~ 5 lines, but the lowest sound (SNR = 3.55) broken three different peaks in 3.11 lines, 2.54 lines. This second phase is shown in Figure 2, which means that there is an important pattern repeated in the words in Carroll used. In addition, because of the growing type of peaks as they approach the 2-line period, one end is that Carroll has a structure of another term.

Fig. 2.

This method of exchange is shown in a visible length of a lengthy length and bulleting, both have second peaks in 2.15 lines. However, the syllable spectrum shown in Figure 3 shows the size at a 2.15 line time, indicating that the calculation of the words, characters, and views of each line.

Figure 3.

Interestingly, since the poem follows the SHYME system of the wicked, it lifts communication between the length of one line with the pattern itself. One ending Carroll found it more attractive when he wrote Rhymings on the edge of the names to straighten it on the page. This conclusion, that the good beauty of each line has transformed Carroll's writing style, can be drawn before reading the text.

4. Conclusion

Using poetic analysis that mathematical tools can find hidden structures in the writing patterns that may reflect the author's style tendency or decisions below. In both cases, existing relationships found between the poem and metric structures (words words, etc.) that are often overlooked by the audit. While this approach does not take a traditional order analysis, it provides a new approach to evaluating official writing qualities. Mathematical combination, computer science, data analytics and books is a promising, and this is another technology that can lead to stylometry fields such as stylometry, and emotional analysis. []

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button