This is a tutorial for GenWord, a Javascript-based vocabulary generator for conlangs based on Mark Rosenfelder's Gen. It's designed to randomly create words based on the settings you give it. There are several defaults built in to give you an idea of how it can perform: try choosing one from the drop-down menu towards the bottom of the screen and click the Load Predef button, if you haven't done so already.
Before you start, you ought to come up with a general idea of what sounds your language will have, and how you will choose to write them out. This will give you a foundation to work from. For the sake of this tutorial, we will work with a fictional language that sounds a lot like English.
We'll take some of the sections out of order.
Categories are where you define what letters are in your words by assigning them, well, categories. The most basic categories would be consonants and vowels. Open GenWord in another tab and hit the Clear Input button, then put the following in the categories box:
From there, you can define what syllables look like. By default, the box marked Use only one type of syllables should be checked. If it's not, go ahead and check it. It keeps things simple. Now, put this in the Syllables box:
Now go ahead and click the Generate button and see what happens. You'll get something much like this:
Wow, that's a lot of Ws, isn't it? That's because the generator looks at your categories and uses the first letter in a category more often than the next letter, and so on down the line. We can affect this by choosing a new Dropoff setting (Equiprobable eliminates it entirely!), but let's try to avoid that, because natural languages use some sounds more than others. Try changing your categories to this:
That's a more normal distribution of letters in English. Hit Generate again and you'll get something like this:
That still looks funky, especially those Hs at the beginning. Turns out, the H doesn't appear by itself in English all that often. Usually, it gets paired with a T, S or C to form th, sh and ch. How could we produce those combinations with the generator when it only outputs one letter at a time?
The answer lies in the Rewrite Rules box. It's used to reconfigure the output in a variety of ways. Each line is a separate rule in the form X||Y, which tells the generator to replace all instances of X with Y.
First off, let's change up the Categories a bit. We can use capital letters to represent the th, sh and ch pairs.
Now, try putting these rules in:
Didja notice I added an extra rule to change "q" to "qu"? English never has a bare Q, except with a few words borrowed from other languages. Go ahead and hit Generate and see what happens:
I underlined a couple of examples where our rules have affected the output!
There are other facets of English we can incorporate into our generator if we start playing with the other syllable boxes. Uncheck that Use only one type of syllables box and a few things change. The Syllables box is now named Word-initial syllables (and will now be used to generate the first syllable of multisyllabic words), and three new boxes have appeared. Let's add some syllable types to them.
This one's pretty straightforward. It's only used when the generator wants to make a single-syllable word, like "I", "you", or "or". I changed the orders around from the Word-initial syllables because the syllable types get chosen in a way similar to the letters in a category: the ones on top are picked more than the others. You can smooth this out by checking the Slow syllable dropoff box.
This one is only used when putting syllables in the middle of a word, like the "la" in "syllables". I've dropped the CVC because it tends not to happen, and because it makes ugly syllables in the middle of words.
This box is only used for the final syllable in words, like "nal" in "final" or "ble" in "syllable".
Now click Generate and see what happens.
Looking better, but I've underlined one glaring non-English cluster there. We can fix that with a new rewrite rule. But be careful: the rules are used in order from top to bottom. Look at this:
In that one, the second rule would never get used, because the first rule would change any "SS" to "shsh" beforehand.
That will work correctly. But we can do even better!
Either of those rules by themselves will do the job of the two rules we used before. They make use of regular expressions, which are special. I can't do a full tutorial on them (Google "javascript regular expression patterns" if you're interested), but I'll briefly explain the two I just used.
SS? means S, optionally followed by another S. The "?" makes the item before it optional. It will match "S" and "SS", but it will only match the first two Ss in "SSS". That's ok, because our syllable rules never allow three consonants in a row.
S+ means any number of Ss in a row. The "+" means "match the item before me as many times as possible". So it can match "S", "SS", "SSS", all the way up to "SSSSSSSSSSSSSSSS" and beyond!
But let's go back to our last generated output. It looks a bit English-y, but not really. It's more like Latin or Italian or something. That's because we're not using the full inventory of English. In particular, English is pretty odd in that it allows tons of consonant clusters. Look at this word: strengths. That's three consonant sounds, a vowel, and three more consonant sounds! That's crazy!
We can solve this problem by adding more categories and syllable types. S and TH are voiceless fricatives, T is a voiceless stop, R is an approximant or liquid, and NG is a nasal stop. (Those are technical terms, and you'll have to start getting used to them if you're going to make languages, so try not to be afraid and just go with me here, all right?)
I went ahead and changed up the following boxes to try and make use of these distinctions. I tried to make the category names have some relation to their contents. See if you can follow along:
Noticed I added a new wrinkle in the word-initial syllables box: sSLV. Everything in GenWord is case-sensitive, so s and S aren't treated the same. S will be replaced with a letter from the S category, but s won't match any category, so the generator will just output it without changing it at all. Check out the first underlined word below for an example:
I underlined some of the new syllables we're getting from this change-up. Plaonth and strideot seem particularly English-like! We could probably do better by changing the other syllable boxes, too.
I'm going to leave off here, but there's so much more you can do. Here are a few ideas you can try on your own to test your skills:
And as a final note: always remember to have fun!
The following is a quick summary of the rewrite rules of the Kartaran Predef. It's based on Ancient Kartara, my first conlang. Many of the rules use Javascript regular expressions.
The line above looks for a string of the same monopthong vowel. If it contains three or more in a row, it replaces it with a string of only two.
The above looks for a string of two or more of the same dipthong vowels in a row, and replaces it with only one.
The above looks for a string of three or more vowels in a row and reduces it to the first two.
The above is made up of several rules about the letter h. They are designed to preserve h only when it's at the start or end of a word, or when it's a part of a penultimate syllable when the word ends with a vowel.
The rules above change the dipthongs to their two-letter symbols.
A ĭ before an i gets reduced to just i.
A ĭ before a retroflex consonant makes it into a non-retroflex consonant.
If a monopthong is followed by an i, and they aren't at the start of a word, they get turned into a dipthong.
Any doubled consonants are reduced to one.
If a stop is followed by r, remove the stop.
If a nasal and a fricative occur together, change the second to match the first's place of articulation. If a k is followed by a nasal, remove the nasal.
If a stop and a fricative occur together, change the second to match the first's place of articulation.
Change certain difficult fricative/stop pairs into easier-to-pronounce pairs.
If retroflex and dental/alveolar occur together, keep the first one.
"Fix" the doubles we may have introduced with the nasal-or-fricative/stop changes. It's easier to just put an accent on the last retroflex in a series.
Finally, change retroflex letters into their correct symbols.