![]() |
SPEECH and LANGUAGE PROCESSINGAn Introduction to Natural Language Processing,
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Select ![]() Select the title for chapter teaching & book notes. |
The 2nd Ed. should be available in its finished form in July '07. We'll continue to post new and revised chapters here as they become available. As usual, we welcome your comments. When sending comments please indicate clearly that you're referring to new and revised chapters. As in "Bug in Ch 8, 2ed" . | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Chapter 3: Words and TransducersThis new version of the chapter still focuses on morphology and FSTs, but is expanded in various ways. There are more details about the formal descriptions of finite-state transducers, many bugs are fixed, and two new sections are added relating to words and subwords. The first new section is on word and sentence tokenization, including algorithms for English as well as the maxmatch algorithm for Chinese word segmentation. The second new section is on spelling correction and minimum edit distance, and is an extended version of the edit-distance section from Chapter 5 of the first edition, with clearer figures for example for explaining the minimum-edit-distance backtrace. Chapter 4: N-grams (Formerly Chapter 6)This updated language model chapter has had a complete overhaul. This draft includes more examples, a more complete description of Good-Turing, expanded sections on practical issues like perplexity and evaluation, language modeling toolkits, including ARPA format, and an overview of modern methods like interpolated Kneser-Ney. Chapter 5: Word Classes and Part-of-Speech Tagging (Formerly Chapter 8)The main change to this revised chapter is a greatly expanded, and hence self-contained, description of bigram and trigram HMM part-of-speech tagging, including Viterbi decoding and deleted interpolation smoothing. Courses that don't include Chapter 7 (speech and HMMs) can now use this chapter to introduce HMM tagging in a self-contained way. Other changes in this chapter include expanded descriptions of unknown word modeling and part-of-speech tagging in other languages, and many bug fixes. Finally, we've moved this chapter earlier in the book and called it Chapter 5; it should be used after the FST chapter 3 and N-gram chapter 4. Chapter 6 (Formerly part of Chapter 7 and Appendix D)This new chapter presents the Hidden Markov Model in details, including Forward, Viterbi, and EM. It will eventuallly also present Loglinear models.. Chapter 7: Phonetics (Formerly parts of Chapters 4, 5, and 7)This chapter is an introduction to articulatory and acoustic phonetics for speech processing, as well as foundational tools like the ARPAbet, wavefile formats, phonetic dictionaries, and PRAAT. Chapter 8: Speech SynthesisThis is a new chapter on speech synthesis. Chapter 9: Automatic Speech Recognition (Formerly 7)This new significantly-expanded speech recognition chapter gives a complete introduction to HMM-based speech recognition, including Gaussian Mixture Model acoustic models, embedded training, as well as overviews of advanced topics like decision-tree clustering for context-dependent phones, n-best lists, lattices, and confusion networks, MLLR adaptation, and discriminative training. The current draft is still missing the section on extraction of MFCC features. Chapter 10: Computational Phonology (Formerly parts of Chapters 4, 5, and 7)This chapter is a brief introduction to computational phonology, including phonological and morphological learning, finite-state models, OT, and Stochastic OT.![]() Chapter 11: Formal Grammars of English (Formerly 9)This chapter still focuses on CFGs for English and includes a revamped and somewhat expanded grammar for the ATIS domain. New and expanded sections cover: treebanks with a focus on the Penn Treebank, searching treebanks with tgrep and tgrep2, heads and head-finding rules, dependency grammars, Categorial grammar, and grammars for spoken language processing. Chapter 12: Parsing with Context-Free Grammars (Formerly 10)The focus of this chapter is still on parsing with CFGs. It now includes sections on CKY, Earley and agenda-based (chart) parsing. In addition, there is a new section on partial parsing with a focus on machine learning based base-phrase chunking and the use of IOB tags. Chapter 16: Semantics (Formerly 14)This chapter still covers basic notions surrounding meaning representation languages. It now has better coverage of model-theoretic semantics for meaning representations, and a new section on Description Logics and their role as a basis for OWL and its role in the Semantic Web. Chapter 19: Computational Lexical Semantics (New Chapter; Parts of old Chs. 15, 16 and 17)The focus of this new chapter is on computing with word meanings. The three main topics are word sense disambiguation, computing relations between words (similarity, hyponymy, etc.), and semantic role labeling. It considerably expands the treatment of these topics. Chapter 20: DiscourseThis rewritten chapter includes a number of updates to the first edition.
The anaphora resolution section is updated to include modern log-linear
methods, and a section on the more general problem of coreference is also included.
The coherence section describes cue-based methods for rhetorical relation and coherence relation extraction.
Finally, there is a significant new section on discourse segmentation (including TextTiling).
Chapter 23: Dialog and Conversational Agents (Formerly 19)This is a completely rewritten version of the dialogue chapter. It includes much more information on modern dialogue systems, including VoiceXML, confirmation and clarification dialogues, the information-state model, markov decision processes, and other current approaches to dialogue agents. Chapter 24: Machine TranslationThe MT chapter has been extensively rewritten and a significant new section added covering statistical MT, including IBM Model 1, Model 3, and HMM alignment. A new evaluation section covering human evaluation and Bleu has also been added, as well as sections on SYSTRAN and more details on cross-linguistic divergences. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||