Figure A1 Pass bands and reject bands of the filter described in question 7 of chapter 1.
Table A1 Formant frequencies in (Hz) of vocal tracts of 12, 15, and 18. End-corrected formants L = L + 1.2 are shown in parentheses.
d | ð | |
d | 0.727 | 0 |
ð | 0.015 | 0.515 |
Figure A2 The F2 locus frequency of [d].
References
Asher, R. E. and Kumari, T. C. (1997) Malayalam, London: Routledge.
Best, C. T. (1995) A direct realist perspective on cross-language speech perception. In W. Strange (ed.), Speech Perception and Linguistic Experience: Theoretical and methodological issues in cross-language speech research, Timonium, MD: York Press, 167–200.
Bladon, A. and Lindblom, B. (1981) Modeling the judgment of vowel quality differences. Journal of the Acoustical Society of America, 69, 1414–22.
Bless, D. M. and Abbs, J. H. (1983) Vocal Fold Physiology: Contemporary research and clinical issues, San Diego: College Hill Press.
Bond, Z. S. (1999) Slips of the Ear: Errors in the perception of casual conversation, San Diego: Academic Press.
Bregman, A. S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound, Cambridge, MA: MIT Press.
Brödel, M. (1946) Three Unpublished Drawings of the Anatomy of the Human Ear, Philadelphia: Saunders.
Campbell, R. (1994) Audiovisual speech: Where, what, when, how? Current Psychology of Cognition, 13, 76–80.
Catford, J. C. (1977) Fundamental Problems in Phonetics, Bloomington: Indiana University Press.
Chiba, T. and Kajiyama, M. (1941) The Vowel: Its nature and structure, Tokyo: Kaiseikan.
Cole, R. A. (1973) Listening for mispronunciations: A measure of what we hear during speech. Perception & Psychophysics, 13, 153–6.
Cooley, J. W., Lewis, P. A. W., and Welch, P. D. (1969) The fast Fourier transform and its applications. IEEE Transactions on Education, 12, 27–34.
Cooper, F. S., Liberman, A. M., and Borst, J. M. (1951) The interconversion of audible and visible patterns as a basis for research in the perception of speech. Proceedings of the National Academy of Science, 37, 318–25.
Davis, S. and Mermelstein, P. (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP 28, 357–66.
Delattre, P. C., Liberman, A. M., and Cooper, F. S. (1955) Acoustic loci and transitional cues for consonants. Journal of the Acoustical Society of America, 27, 769–73.
Egan, J. P. and Hake, H. W. (1950) On the masking pattern of a simple auditory stimulus. Journal of the Acoustical Society of America, 22, 622–30.
Elman, J. L. and McClelland, J. L. (1988) Cognitive penetration of the mechanisms of perception: Compensation for coarticulation of lexically restored phonemes. Journal of Memory and Language, 27, 143–65.
Fant, G. (1960) Acoustic Theory of Speech Production, The Hague: Mouton.
Flanagan, J. L. (1965) Speech Analysis Synthesis and Perception, Berlin: Springer-Verlag.
Flege, J. E. (1995) Second language speech learning: Theory, findings, and problems. In W. Strange (ed.), Speech Perception and Linguistic Experience: Theoretical and methodological issues in cross-language speech research, Timonium, MD: York Press, 167–200.
Forrest, K., Weismer, G., Milenkovic, P., and Dougall, R. N. (1988) Statistical analysis of word-initial voiceless obstruents: Preliminary data. Journal of the Acoustical Society of America, 84, 115–23.
Fry, D. B. (1979) The Physics of Speech, Cambridge: Cambridge University Press.
Fujimura, O. (1962) Analysis of nasal consonants. Journal of the Acoustical Society of America, 32, 1865–75.
Ganong, W. F. (1980) Phonetic categorization in auditory word recognition. Journal of Experimental Psychology: Human Perception and Performance, 6, 110–25.
Green, K. P., Kuhl, P. K., Meltzoff, A. N., and Stevens, E. B. (1991) Integrating speech information across talkers, gender, and sensory modality: Female faces and male voices in the McGurk effect. Perception & Psychophysics, 50, 524–36.
Guion, S. G. (1998) The role of perception in the sound change of velar palatalization. Phonetica, 55, 18–52.
Hagiwara, R. (1995) Acoustic realizations of American /r/ as produced by women and men. UCLA Working Papers in Phonetics, 90, 1–187.
Halle, M. and Stevens, K. N. (1969) On the feature “advanced tongue root.” Quarterly Progress Report, 94, 209–15. Research Laboratory of Electronics, MIT.
Harnsberger, J. D. (2001) The perception of Malayalam nasal consonants by Marathi, Punjabi, Tamil, Oriya, Bengali, and American English listeners: A multidimensional scaling analysis. Journal of Phonetics, 29, 303–27.
Heinz, J. M. and Stevens, K. N. (1961) On the properties of voiceless fricative consonants. Journal of the Acoustical Society of America, 33, 589–96.
Jakobson, R., Fant, G., and Halle, M. (1952) Preliminaries to Speech Analysis, Cambridge, MA: MIT Press.
Jassem, W. (1979) Classification of fricative spectra using statistical discriminant functions. In B. Lindblom and S. Öhman (eds.), Frontiers of Speech Communication Research, New York: Academic Press, 77–91.
Johnson, K. (1989) Contrast and normalization in vowel perception. Journal of Phonetics, 18, 229–54.
Johnson, K. (1992) Acoustic and auditory analysis of Xhosa clicks and pulmonics. UCLA Working Papers in Phonetics, 83, 33–47.
Johnson, K. (2008) Quantitative Methods in Linguistics, Oxford: Wiley-Blackwell.
Johnson, K. and Ralston, J. V. (1994) Automaticity in speech perception: Some speech/nonspeech comparisons. Phonetica, 51(4), 195–209.
Johnson, K., Ladefoged, P., and Lindau, M. (1993) Individual differences in vowel production. Journal of the Acoustical Society of America, 94, 701–14.
Joos, M. (1948) Acoustic phonetics. Language, 23, suppl. 1.
Klatt, D. H. and Klatt, L. (1990) Analysis, synthesis, and perception of voice quality variations among female and male talkers. Journal of the Acoustical Society of America, 87, 820–57.
Kuhl, P. K., Williams, K. A., Lacerda, F., Stevens, K. N., and Lindblom, B. (1992) Linguistic experiences alter phonetic perception in infants by 6 months of age. Science, 255, 606–8.
Ladefoged, P. (1996) Elements of Acoustic Phonetics, 2nd edn., Chicago: University of Chicago Press.
Ladefoged, P. and Maddieson, I. (1996) The Sounds of the World’s Languages, Oxford: Blackwell.
Ladefoged, P., DeClerk, J., Lindau, M., and Papcun, G. (1972) An auditory-motor theory of speech production. UCLA Working Papers in Phonetics, 22, 48–75.
Lambacher, S., Martens, W., Nelson, B., and Berman, J. (2001) Identification of English voiceless fricatives by Japanese listeners: The influence of vowel context on sensitivity and response bias. Acoustic Science & Technology, 22, 334–43.
Laver, J. (1980) The Phonetic Description of Voice Quality, Cambridge: Cambridge University Press.
Liberman, A. M., Harris, K. S., Hoffman H. S., and Griffith, B. C. (1957) The discrimination of speech sounds within and across phoneme boundaries. Journal of Experimental Psychology, 54, 358–68.
Liljencrants, J. and Lindblom, B. (1972) Numerical simulation of vowel quality systems: The role of perceptual contrast. Language, 48, 839–62.
Lindau, M. (1978) Vowel features. Language, 54, 541–63.
Lindau, M. (1979) The feature “expanded.” Journal of Phonetics, 7, 163–76.
Lindau, M. (1984) Phonetic differences in glottalic consonants. Journal of Phonetics, 12, 147–55.
Lindau, M. (1985) The story of /r/. In V. Fromkin (ed.), Phonetic Linguistics: Essays in honor of Peter Ladefoged, Orlando, FL: Academic Press.
Lindblom, B. (1990) Explaining phonetic variation: A sketch of the H&H theory. In W. J. Hardcastle and A. Marchal (eds.), Speech Production and Speech Modeling, Dordrecht: Kluwer, 403–39.
Lindqvist-Gauffin, J. and Sundberg, J. (1976) Acoustic properties of the nasal tract. Phonetica, 33, 161–8.
Lotto, A. J. and Kluender, K. R. (1998) General contrast effects in speech perception: Effect of preceding liquid on stop consonant identification. Perception & Psychophysics, 60, 602–19.
Lubker, J. (1968) An EMG-cinefluorographic investigation of velar function during normal speech production. Cleft Palate Journal, 5, 1–18.
Lyons, R. F. (1982) A computational model of filtering, detection and compression in the cochlea. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1282–5.
Lyons, R. F. (1997) Understanding Digital Signal Processing, Reading, MA: Addison-Wesley.
Maddieson, I. (1984) Patterns of Sounds, Cambridge: Cambridge University Press.
Maeda, S. (1993) Acoustics of vowel nasalization and articulatory shifts in French nasal vowels. In M. K. Huffman and R. A. Krakow (eds.), Phonetics and Phonology, vol. 5: Nasals, nasalization, and the velum, New York: Academic Press, 147–67.
Mann, V. A. (1980) Influence of preceding liquid on stop-consonant perception. Perception & Psychophysics, 28, 407–12.
Marple, L. (1987) Digital Spectral Analysis with Applications, Englewood Cliffs, NJ: Prentice Hall.
McGurk, H. and MacDonald, J. (1976) Hearing lips and seeing voices. Nature, 264, 746–8.
McDonough, J. (1993) The phonological representation of laterals. UCLA Working Papers in Phonetics, 83, 19–32.
McDonough, J. and Ladefoged, P. (1993) Navajo stops. UCLA Working Papers in Phonetics, 84, 151–64.
Miller, G. A. and Nicely, P. E. (1955) An analysis of perceptual confusions among some English consonants. Journal of the Acoustical Society of America, 27, 338–52.
Miller, J. D. (1989) Auditory-perceptual interpretation of the vowel. Journal of the Acoustical Society of America, 85, 2114–34.
Moll, K. L. (1962) Velopharyngeal closure in vowels. Journal of Speech and Hearing Research, 5, 30–7.
Moore, B. C. J. (1982) An Introduction to the Psychology of Hearing, 2nd edn., New York: Academic Press.
Moore, B. C. J. and Glasberg, B. R. (1983) Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. Journal of the Acoustical Society of America, 74, 750–3.
Mrayati, M., Carré, R., and Guérin, B. (1988) Distinctive regions and modes: A new theory of speech production. Speech Communication, 7, 257–86.
O’Shaughnessy, D. (1987) Speech Communication: Human and machine, Reading, MA: Addison-Wesley.
Parzen, E. (1962) On estimation of a probability density function and mode. Annals of Mathematical Statistics, 33, 1065–76.
Pastore, R. E. and Farrington, S. M. (1996) Measuring the difference limen for identification of order of onset for complex auditory stimuli. Perception & Psychophysics, 58(4), 510–26.
Patterson, R. D. (1976) Auditory filter shapes derived from noise stimuli. Journal of the Acoustical Society of America, 59, 640–54.
Perkell, J. (1971) Physiology of speech production: A preliminary study of two suggested revisions of the features specifying vowels. Quarterly Progress Report, 102, 123–39. Research Institute of Electronics, MIT.
Petursson, M. (1973) Quelques remarques sur l’aspect articulatoire et acoustique des constrictives intrabuccales Islandaises. Travaux de l’Institut de Phonétique de Strasbourg, 5, 79–99.
Pickles, J. O. (1988) An Introduction to the Physiology of Hearing, 2nd edn., New York: Academic Press.
Pisoni, D. B. (1977) Identification and discrimination of the relative onset time of two-component tones: Implications for voicing perception in stops. Journal of the Acoustical Society of America, 61, 1352–61.
Potter, R. K., Kopp, G. A., and Green, H. (1947) Visible Speech, Dordrecht: Van Nostrand.
Qi, Y. (1989) Acoustic features of nasal consonants. Unpublished Ph.D. diss., Ohio State University.
Rand, T. C. (1974) Dichotic release from masking for speech. Journal of the Acoustical Society of America, 55(3), 678–80.
Raphael, L. J. and Bell-Berti, F. (1975) Tongue musculature and the feature of tension in English vowels. Phonetica, 32, 61–73.
Rayleigh, J. W. S. (1896) The Theory of Sound, London: Macmillan; repr. 1945, New York: Dover.
Remez, R. E., Rubin, P. E., Pisoni, D. B., and Carrell, T. D. (1981) Speech perception without traditional speech cues. Science, 212, 947–50.
Repp, B. (1986) Perception of the [m]–[n] distinction in CV syllables. Journal of the Acoustical Society of America, 79, 1987–99.
Rosenblum, L. D., Schmuckler, M. A., and Johnson, J. A. (1997) The McGurk effect in infants. Perception & Psychophysics, 59, 347–57.
Samuel, A. G. (1991) A further examination of the role of attention in the phonemic restoration illusion. Quarterly Journal of Experimental Psychology, 43A, 679–99.
Schroeder, M. R., Atal, B. S., and Hall, J. L. (1979) Objective measure of certain speech signal degradations based on masking properties of human auditory perception. In B. Lindblom and S. Öhman (eds.), Frontiers of Speech Communication Research, London: Academic Press, 217–29.
Sekiyama, K. and Tohkura, Y. (1993) Inter-language differences in the influence of visual cues in speech perception. Journal of Phonetics, 21, 427–44.
Seneff, S. (1988) A joint synchrony/mean-rate model of auditory speech processing. Journal of Phonetics, 16, 55–76.
Shadle, C. (1985) The acoustics of fricative consonants. RLE Technical Report, 506, MIT.
Shadle, C. (1991) The effect of geometry on source mechanisms of fricative consonants. Journal of Phonetics, 19, 409–24.
Shannon, C. E. and Weaver, W. (1949) The Mathematical Theory of Communication, Urbana: University of Illinois.
Shepard, R. N. (1972) Psychological representation of speech sounds. In E. E. David and P. B. Denes (eds.), Human Communication: A unified view. New York: McGraw-Hill, 67–113.
Slaney, M. (1988) Lyons’ cochlear model. Apple Technical Report, 13. Apple Corporate Library, Cupertino, CA.
Stevens, K. N. (1972) The quantal nature of speech: Evidence from articulatory-acoustic data. In E. E. David, Jr. and P. B. Denes (eds.), Human Communication: A unified view, New York: McGraw-Hill, 51–66.
Stevens, K. N. (1987) Interaction between acoustic sources and vocal-tract configurations for consonants. Proceedings of the Eleventh International Conference on Phonetic Sciences, 3, 385–9.
Stevens, K. N. (1989) On the quantal nature of speech. Journal of Phonetics, 17, 3–45.
Stevens, K. N. (1999) Acoustic Phonetics, Cambridge, MA: MIT Press.
Stevens, S. S. (1957) Concerning the form of the loudness function. Journal of the Acoustical Society of America, 29, 603–6.
Stockwell, R. P. (1973) Problems in the interpretation of the Great English Vowel Shift. In M. E. Smith (ed.), Studies in Linguistics in Honor of George L. Trager, The Hague: Mouton, 344–62.
Stone, M. (1991) Toward a model of three-dimensional tongue movement. Journal of Phonetics, 19, 309–20.
Straka, G. (1965) Album phonétique, Laval: Les Presses de l’Université Laval.
Syrdal, A. K. and Gophal, H. S. (1986) A perceptual model of vowel recognition based on the auditory representation of American English vowels. Journal of the Acoustical Society of America, 79, 1086–1100.
Terbeek, D. (1977) A cross-language multidimensional scaling study of vowel perception. UCLA Working Papers in Phonetics, 37, 1–271.
Traunmüller, H. (1981) Perceptual dimension of openness in vowels. Journal of the Acoustical Society of America, 69, 1465–75.
Walker, S., Bruce, V., and O’Malley, C. (1995) Facial identity and facial speech processing: Familiar faces and voices in the McGurk effect. Perception & Psychophysics, 57, 1124–33.
Warren, R. M. (1970) Perceptual restoration of missing speech sounds. Science, 167, 392–3.
Wright, J. T. (1986) The behavior of nasalized vowels in the perceptual vowel space. In J. J. Ohala and J. J. Jaeger (eds.), Experimental Phonology, New York: Academic Press, 45–67.
Zwicker, E. (1961) Subdivision of the audible frequency range into critical bands (Frequenzgruppen). Journal of the Acoustical Society of America, 33, 248.
Zwicker, E. (1975) Scaling. In W. D. Keidel and W. D. Neff (eds.), Auditory System: Physiology (CNS), behavioral studies, psychoacoustics, Berlin: Springer-Verlag.
Several types of events in the world produce the sensation of sound. Examples include doors slamming, plucking a violin string, wind whistling around a corner, and human speech. All these examples, and any others we could think of, involve movement of some sort. And these movements cause pressure fluctuations in the surrounding air (or some other acoustic medium). When pressure fluctuations reach the eardrum, they cause it to move, and the auditory system translates these movements into neural impulses which we experience as sound. Thus, sound is produced when pressure fluctuations impinge upon the eardrum. An acoustic waveform is a record of sound-producing pressure fluctuations over time. (Ladefoged, 1996, Fry, 1979, and Stevens, 1999, provide more detailed discussions of the topics covered in this chapter.)
Pressure fluctuations impinging on the eardrum produce the sensation of sound, but sound can travel across relatively long distances. This is because a sound produced at one place sets up a sound wave that travels through the acoustic medium. A sound wave is a traveling pressure fluctuation that propagates through any medium that is elastic enough to allow molecules to crowd together and move apart. The wave in a lake after you throw in a stone is an example. The impact of the stone is transmitted over a relatively large distance. The water particles don’t travel; the pressure fluctuation does.
A line of people waiting to get into a movie is a useful analogy for a sound wave. When the person at the front of the line moves, a “vacuum” is created between the first person and the next person in the line (the gap between them is increased), so the second person steps forward. Now there is a vacuum between person two and person three, so person three steps forward. Eventually, the last person in the line gets to move; the last person is affected by a movement that occurred at the front of the line, because the pressure fluctuation (the gap in the line) traveled, even though each person in the line moved very little. The analogy is flawed, because in most lines you get to move to the front eventually. For this to be a proper analogy for sound propagation, we would have to imagine that the first person is shoved back into the second person and that this crowding or increase of pressure (like the vacuum) is transmitted down the line.
Figure 1.2 shows a pressure waveform at the location indicated by the asterisk in figure 1.1. The horizontal axis shows the passage of time, the vertical axis the degree of crowdedness (which in a sound wave corresponds to air pressure). At time 3 there is a sudden drop in crowdedness because person two stepped up and left a gap in the line. At time 4 normal crowdedness is restored when person 3 steps up to fill the gap left by person 2. At time 10 there is a sudden increase in crowdedness as person 2 steps back and bumps into person 3. The graph in figure 1.2 is a way of representing the traveling rarefaction and compression waves shown in figure 1.1. Given a uniform acoustic medium, we could reconstruct figure 1.1 from figure 1.2 (though note the discussion in the next paragraph on sound energy dissipation). Graphs like the one shown in figure 1.2 are more typical in acoustic phonetics, because this is the type of view of a sound wave that is produced by a microphone – it shows amplitude fluctuations as they travel past a particular point in space.
Figure 1.1 Wave motion in a line of seven people waiting to get into a show. Time is shown across the top of the graph running from earlier (time 1) to later (time 15) in arbitrary units.
Figure 1.2 A pressure waveform of the wave motion shown in figure 1.1. Time is again shown on the horizontal axis. The vertical axis shows the distance between people.
Sound waves lose energy as they travel through air (or any other acoustic medium), because it takes energy to move the molecules. Perhaps you have noticed a similar phenomenon when you stand in a long line. If the first person steps forward, and then back, only a few people at the front of the line may be affected, because people further down the line have inertia; they will tolerate some change in pressure (distance between people) before they actually move in response to the change. Thus the disturbance at the front of the line may not have any effect on the people at the end of a long line. Also, people tend to fidget, so the difference between movement propagated down the line and inherent fidgeting (the signal-to-noise ratio) may be difficult to detect if the movement is small. The rate of sound dissipation in air is different from the dissipation of a movement in a line, because sound radiates in three dimensions from the sound source (in a sphere). This means that the number of air molecules being moved by the sound wave greatly increases as the wave radiates from the sound source. Thus the amount of energy available to move each molecule on the surface of the sphere decreases as the wave expands out from the sound source; consequently the amount of particle movement decreases as a function of the distance from the sound source (by a power of 3). That is why singers in heavy metal bands put the microphone right up to their lips. They would be drowned out by the general din otherwise. It is also why you should position the microphone close to the speaker’s mouth when you record a sample of speech (although it is important to keep the microphone to the side of the speaker’s lips, to avoid the blowing noises in [p]’s, etc.).
There are two types of sounds: periodic and aperiodic. Periodic sounds have a pattern that repeats at regular intervals. They come in two types: simple and complex.
Simple periodic waves are also called sine waves: they result from simple harmonic motion, such as the swing of a pendulum. The only time we humans get close to producing simple periodic waves in speech is when we’re very young. Children’s vocal cord vibration comes close to being sinusoidal, and usually women’s vocal cord vibration is more sinusoidal than men’s. Despite the fact that simple periodic waves rarely occur in speech, they are important, because more complex sounds can be described as combinations of sine waves. In order to define a sine wave, one needs to know just three properties. These are illustrated in figures 1.3–1.4.
Figure 1.3 A 100 Hz sine wave with the duration of one cycle (the period) and the peak amplitude labeled.
Figure 1.4 Two sine waves with identical frequency and amplitude, but 90° out of phase.
The first is frequency: the number of times the sinusoidal pattern repeats per unit time (on the horizontal axis). Each repetition of the pattern is called a cycle, and the duration of a cycle is its period. Frequency can be expressed as cycles per second, which, by convention, is called hertz (and abbreviated Hz). So to get the frequency of a sine wave in Hz (cycles per second), you divide one second by the period (the duration of one cycle). That is, frequency in Hz equals 1/T, where T is the period in seconds. For example, the sine wave in figure 1.3 completes one cycle in 0.01 seconds. The number of cycles this wave could complete in one second is 100 (that is, one second divided by the amount of time each cycle takes in seconds, or 1/0.01 = 100). So, this waveform has a frequency of 100 cycles per second (100 Hz).
The second property of a simple periodic wave is its amplitude: the peak deviation of a pressure fluctuation from normal, atmospheric pressure. In a sound pressure waveform the amplitude of the wave is represented on the vertical axis.
The third property of sine waves is their phase: the timing of the waveform relative to some reference point. You can draw a sine wave by taking amplitude values from a set of right triangles that fit inside a circle (see exercise 4 at the end of this chapter). One time around the circle equals one sine wave on the paper. Thus we can identify locations in a sine wave by degrees of rotation around a circle. This is illustrated in figure 1.4. Both sine waves shown in this figure start at 0° in the sinusoidal cycle. In both, the peak amplitude occurs at 90°, the downward-going (negative-going) zero-crossing at 180°, the negative peak at 270°, and the cycle ends at 360°. But these two sine waves with exactly the same amplitude and frequency may still differ in terms of their relative timing, or phase. In this case they are 90° out of phase.
Complex periodic waves are like simple periodic waves in that they involve a repeating waveform pattern and thus have cycles. However, complex periodic waves are composed of at least two sine waves. Consider the wave shown in figure 1.5, for example. Like the simple sine waves shown in figures 1.3 and 1.4, this waveform completes one cycle in 0.01 seconds (i.e. 10 milliseconds). However, it has a additional component that completes ten cycles in this same amount of time. Notice the “ripples” in the waveform. You can count ten small positive peaks in one cycle of the waveform, one for each cycle of the additional frequency component in the complex wave. I produced this example by adding a 100 Hz sine wave and a (lower-amplitude) 1,000 Hz sine wave. So the 1,000 Hz wave combined with the 100 Hz wave produces a complex periodic wave. The rate at which the complex pattern repeats is called the fundamental frequency (abbreviated F0).
Figure 1.5 A complex periodic wave composed of a 100 Hz sine wave and a 1,000 Hz sine wave. One cycle of the fundamental frequency (F0) is labeled.
Figure 1.6 A complex periodic wave that approximates the “sawtooth” wave shape, and the four lowest sine waves of the set that were combined to produce the complex wave.
Figure 1.6 shows another complex wave (and four of the sine waves that were added together to produce it). This wave shape approximates a sawtooth pattern. Unlike in the previous example, it is not possible to identify the component sine waves by looking at the complex wave pattern. Notice how all four of the component sine waves have positive peaks early in the complex wave’s cycle and negative peaks toward the end of the cycle. These peaks add together to produce a sharp peak early in the cycle and a sharp valley at the end of the cycle, and tend to cancel each other over the rest of the cycle. We can’t see individual peaks corresponding to the cycles of the component waves. Nonetheless, the complex wave was produced by adding together simple components.
Now let’s look at how to represent the frequency components that make up a complex periodic wave. What we’re looking for is a way to show the component sine waves of the complex wave when they are not easily visible in the waveform itself. One way to do this is to list the frequencies and amplitudes of the component sine waves like this:
In this discussion I am skipping over a complicated matter. We can describe the amplitudes of sine waves on a number of different measurement scales, relating to the magnitude of the wave, its intensity, or its perceived loudness (see chapter 4 for more discussion of this). In this chapter, I am representing the magnitude of the sound wave in relative terms, so that I don’t have to introduce units of measure for amplitude (instead I have to add this long apology!). So, the 200 Hz component has and amplitude that is one half the magnitude of the 100 Hz component, and so on.
Figure 1.7 shows a graph of these values with frequency on the horizontal axis and amplitude on the vertical axis. The graphical display of component frequencies is the best method for showing the simple periodic components of a complex periodic wave, because complex waves are often composed of so many frequency components that a table is impractical. An amplitude versus frequency plot of the simple sine wave components of a complex wave is called a power spectrum.
Figure 1.7 The frequencies and amplitudes of the simple periodic components of the complex wave shown in figure 1.6 presented in graphic format.
Here’s why it is so important that complex periodic waves can be constructed by adding together sine waves. It is possible to produce an infinite variety of complex wave shapes by combining sine waves that have different frequencies, amplitudes, and phases. A related property of sound waves is that any complex acoustic wave can be analyzed in terms of the sine wave components that could have been used to produce that wave. That is, any complex waveform can be decomposed into a set of sine waves having particular frequencies, amplitudes, and phase relations. This property of sound waves is called Fourier’s theorem, after the seventeenth-century mathematician who discovered it.
In Fourier analysis we take a complex periodic wave having an arbitrary number of components and derive the frequencies, amplitudes, and phases of those components. The result of Fourier analysis is a power spectrum similar to the one shown in figure 1.7. (We ignore the phases of the component waves, because these have only a minor impact on the perception of sound.)
Aperiodic sounds, unlike simple or complex periodic sounds, do not have a regularly repeating pattern; they have either a random waveform or a pattern that doesn’t repeat. Sound characterized by random pressure fluctuation is called white noise. It sounds something like radio static or wind blowing through trees. Even though white noise is not periodic, it is possible to perform a Fourier analysis on it; however, unlike Fourier analyses of periodic signals composed of only a few sine waves, the spectrum of white noise is not characterized by sharp peaks, but, rather, has equal amplitude for all possible frequency components (the spectrum is flat). Like sine waves, white noise is an abstraction, although many naturally occurring sounds are similar to white noise; for instance, the sound of the wind or fricative speech sounds like [s] or [f].
Figures 1.8 and 1.9 show the acoustic waveform and the power spectrum, respectively, of a sample of white noise. Note that the waveform shown in figure 1.8 is irregular, with no discernible repeating pattern. Note too that the spectrum shown in figure 1.9 is flat across the top. As we will see in chapter 3 (on digital signal processing), a Fourier analysis of a short chunk (called an “analysis window”) of a waveform leads to inaccuracies in the resultant spectrum. That’s why this spectrum has some peaks and valleys even though, according to theory, white noise should have a flat spectrum.
Figure 1.8 A 20 ms section of an acoustic waveform of white noise. The amplitude at any given point in time is random.
Figure 1.9 The power spectrum of the white noise shown in figure 1.8.
Figure 1.10Figure 1.11figure 1.10figure 1.11figure 1.9