Help:Extension:Wikispeech/Lexicon editor

From Linux Web Expert

Revision as of 03:24, 31 May 2023 by imported>Sebastian Berlin (WMSE)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Wikispeech has a pronunciation lexicon that is used when reading text. Every time a sentence is read, the words are looked up in the lexicon. If a word has a pronunciation defined it will be used and if not the pronunciation will be guessed based on the spelling. This means that it's not necessary to have a predefined pronunciation, but it helps. It's especially useful in cases where pronunciations are irregular such as with loan words and proper nouns.

This lexicon can be edited through Special:EditLexicon. With this special page you can look up words in the lexicon and edit their pronunciations. You can also add new pronunciations for words that don't have any. The pronunciation is entered using IPA.

When a pronunciation is added or edited it will not change right away. This is because utterances are saved for a time so that they can be reused without having to be processed again.

Edit a word

Select the language, enter the word and click next.

At the next page you will see a field, id, and a at the bottom a list of existing entries matching the word. Select the id of the entry that you want to edit and click next. If the entry you're looking for doesn't exist, instead see "Add a word".

The next page has fields to modify the selected entry. Transcription contains the phonetic transcription in IPA, see "Transcriptions". After entering a transcription you can click preview to have it read out. Note that it may take a few seconds to generate the preview. If there is an error during the preview generation you will see a popup dialogue, see "Transcription preview errors". You can also select whether the entry should be preferred or not. A preferred entry will be prioritised when the lexicon is used. Click save to save the changes to the lexicon.

Add a word

Select the language, enter the word and click next.

At the next page you will see a field, id, and a at the bottom a list of existing entries matching the word. Select the "New" for the id and click next.

The next page has fields for the new entry. Transcription contains the phonetic transcription in IPA, see "Transcriptions". After entering a transcription you can click preview to have it read out. Note that it may take a few seconds to generate preview. If there is an error during the preview generation you will see a popup dialogue, see "Transcription preview errors". You can also select whether the entry should be preferred or not. A preferred entry will be prioritised when the lexicon is used. Click save to save the changes to the lexicon.

Transcriptions

Transcriptions tells Wikispeech how a word should be pronounced. IPA is used when entering the transcription in the lexicon editor.

Entering a transcription can be done using the keyboard if the Universal Language Selector is installed. In that case you will briefly see a keyboard icon next to the text field when you type. Clicking it and selecting "International Phonetic Alphabet - X-SAMPA" or pressing Ctrl+M will switch the input.

You can also copy a transcription from e.g. an article or a Wiktionary entry.

Transcription preview errors

If the transcription that you entered was invalid you'll get an error popup. It will tell you what went wrong.

The error message will be in English regardless of what language you have selected in preferences.

ERROR: failed mapping transcription : found unknown phonemes in transcription...

One or more of the phonemes used were not in the symbol set. A symbol set is a list of phonemes that can be used and varies from language to language. At the end of the message there will be a list of the unknown phonemes inside square brackets, e.g. [a e].

Solution

Replace the unknown phonemes with ones from the symbol set. A list of the available phonemes can be found under "Symbol sets".

Symbol sets

English

Phoneme Example word Unicode
p pin U+0070
t tin U+0074
k kin U+006B
b bin U+0062
d din U+0064
g give U+0067
t⁀ʃ chin U+0074 U+2040 U+0283
d⁀ʒ gin U+0064 U+2040 U+0292
f fin U+0066
v vim U+0076
θ thin U+03B8
ð this U+00F0
s sin U+0073
z zing U+007A
ʃ shin U+0283
ʒ measure U+0292
h hit U+0068
l long U+006C
m mock U+006D
n knock U+006E
ŋ thing U+014B
r wrong U+0072
w wasp U+0077
j yacht U+006A
ɒ pot U+0252
ɔ cause U+0254
u lose U+0075
i ease U+0069
æ pat U+00E6
ʌ cut U+028C
ɛ pet U+025B
ɪ pit U+026A
ʊ put U+028A
ə allow U+0259
ɝ furs U+025D
a⁀ʊ rouse U+0061 U+2040 U+028A
ɔ⁀ɪ noise U+0254 U+2040 U+026A
o⁀ʊ nose U+006F U+2040 U+028A
e⁀ɪ raise U+0065 U+2040 U+026A
a⁀ɪ rise U+0061 U+2040 U+026A
. syllable delimiter U+002E
ˈ primary stress U+02C8
ˌ secondary stress U+02CC