GenWord - A Tutorial

This is a tutorial for GenWord, a Javascript-based vocabulary generator for conlangs based on Mark Rosenfelder's Gen. It's designed to randomly create words based on the settings you give it. There are several defaults built in to give you an idea of how it can perform: try choosing one from the drop-down menu towards the bottom of the screen and click the Load Predef button, if you haven't done so already.

Before you start, you ought to come up with a general idea of what sounds your language will have, and how you will choose to write them out. This will give you a foundation to work from. For the sake of this tutorial, we will work with a fictional language that sounds a lot like English.

We'll take some of the sections out of order.

Categories are where you define what letters are in your words by assigning them, well, categories. The most basic categories would be consonants and vowels. Open GenWord in another tab and hit the Clear Input button, then put the following in the categories box:

Categories:

From there, you can define what syllables look like. By default, the box marked Use only one type of syllables should be checked. If it's not, go ahead and check it. It keeps things simple. Now, put this in the Syllables box:

Syllables:

Now go ahead and click the Generate button and see what happens. You'll get something much like this:

Woaw rowot ette retay yorray piwo. Re orray o tipwoaw we? Hutre aar erkayrar layew tarirew. Totyitra woylu la wayot. Ra weruwar row worote. Wowawxat walraugow tawatjajet lewitror arwo yiw ertar. Topi powawotaa wojtetdew arhalriyye ow to. Jatru ewtawa rorlowaplepku ora yortuwpa otwar. Kekap we oaygorul gihirrara. Awwew ra iwertor ow wowwo wuwtuytit? Awfet yatew er wootwetewey taw roy. Pujrow wawlalpoyawro owlarawtijtuw parurookar ekol. Wiwot upoad row yarat urir irrowfoywaw? Watyowway ge yaw pasowwewwiw wu.

Wow, that's a lot of Ws, isn't it? That's because the generator looks at your categories and uses the first letter in a category more often than the next letter, and so on down the line. We can affect this by choosing a new Dropoff setting (Equiprobable eliminates it entirely!), but let's try to avoid that, because natural languages use some sounds more than others. Try changing your categories to this:

Categories:

That's a more normal distribution of letters in English. Hit Generate again and you'll get something like this:

Ehhat et ehoh itsodo undehtamna tennelis nudi sottottidot een tantot. Datfun son tomdohto tutno at tine. Tot hetas notaef utetserfussam essa salut. Rotetosa gen sutres rora eno netteb. Nenansuhlate teen neto nusen ettit leshaaen. Das tatone nesesir nani natut solot? Natewus tit nateiit etew mene usnered onah. Tusos tetse he dosnet rotwate? Tudoto ialat woh ho te het. Otet natri e tiol seso. Seiese nistiutitten atana ritutesna er soddutommetton. Tema turentedos otheta marnihtete tandathen nele. Tunto ettaso saryiwnat tesana wanmoon et orahet.

That still looks funky, especially those Hs at the beginning. Turns out, the H doesn't appear by itself in English all that often. Usually, it gets paired with a T, S or C to form th, sh and ch. How could we produce those combinations with the generator when it only outputs one letter at a time?

The answer lies in the Rewrite Rules box. It's used to reconfigure the output in a variety of ways. Each line is a separate rule in the form X||Y, which tells the generator to replace all instances of X with Y.

First off, let's change up the Categories a bit. We can use capital letters to represent the th, sh and ch pairs.

Categories:

Now, try putting these rules in:

Rewrite rules:

Didja notice I added an extra rule to change "q" to "qu"? English never has a bare Q, except with a few words borrowed from other languages. Go ahead and hit Generate and see what happens:

Tot e lemedaernet nuoth rushe iretvet en nem dashsontasre. Tossas ite rensen et let tethdaes? Oroa ruriteter neror tin. Sa tototeshesh erone tannetron addat rasisninruro. Dechoson noon ath ni neo ani. Unnetlen rosautos wetetoen atela nosut rutsha. Setus na shetersin otnodtid os. Tenere ostotte utdattotnat tod retna tete? Nando? Tosishshas enotlatu atiqulen laut nan. Sottonronse osen. Sanninusepit rutut tetas tedrin usut te. Tudetenu ewi ded rutanton teushtir lathmiutnetut ta. Terran midte detatte afshe su oshsonmarit? Nin sasnesh eniryit susi. Ro sattedi larsha san tettas nusat?

I underlined a couple of examples where our rules have affected the output!

There are other facets of English we can incorporate into our generator if we start playing with the other syllable boxes. Uncheck that Use only one type of syllables box and a few things change. The Syllables box is now named Word-initial syllables (and will now be used to generate the first syllable of multisyllabic words), and three new boxes have appeared. Let's add some syllable types to them.

Single-word syllables:

This one's pretty straightforward. It's only used when the generator wants to make a single-syllable word, like "I", "you", or "or". I changed the orders around from the Word-initial syllables because the syllable types get chosen in a way similar to the letters in a category: the ones on top are picked more than the others. You can smooth this out by checking the Slow syllable dropoff box.

Mid-word syllables:

This one is only used when putting syllables in the middle of a word, like the "la" in "syllables". I've dropped the CVC because it tends not to happen, and because it makes ugly syllables in the middle of words.

Word-final syllables:

This box is only used for the final syllable in words, like "nal" in "final" or "ble" in "syllable".

Now click Generate and see what happens.

Tenut sedthena sinnee terartad nisetmenem. Moergano tunenis res metoet ni tati. Ta thulsalonean ses inetore otdi po quit? Onar o rattinutotel utro rar lidtaer. Eshshe noner lem nefsos nota to. Nadta urratan soshuta ash ton tanonas. Terra ne nennat tolnion setsu dutomat? Nonane testusoni re radoan tetlase. Im shaetagem ini quit setef shedni. Itreedsu af eutsoensa nurate netquo ete?

Looking better, but I've underlined one glaring non-English cluster there. We can fix that with a new rewrite rule. But be careful: the rules are used in order from top to bottom. Look at this:

Rewrite rules:

In that one, the second rule would never get used, because the first rule would change any "SS" to "shsh" beforehand.

Rewrite rules:

That will work correctly. But we can do even better!

Rewrite rules:

Either of those rules by themselves will do the job of the two rules we used before. They make use of regular expressions, which are special. I can't do a full tutorial on them (Google "javascript regular expression patterns" if you're interested), but I'll briefly explain the two I just used.

SS? means S, optionally followed by another S. The "?" makes the item before it optional. It will match "S" and "SS", but it will only match the first two Ss in "SSS". That's ok, because our syllable rules never allow three consonants in a row.

S+ means any number of Ss in a row. The "+" means "match the item before me as many times as possible". So it can match "S", "SS", "SSS", all the way up to "SSSSSSSSSSSSSSSS" and beyond!

But let's go back to our last generated output. It looks a bit English-y, but not really. It's more like Latin or Italian or something. That's because we're not using the full inventory of English. In particular, English is pretty odd in that it allows tons of consonant clusters. Look at this word: strengths. That's three consonant sounds, a vowel, and three more consonant sounds! That's crazy!

We can solve this problem by adding more categories and syllable types. S and TH are voiceless fricatives, T is a voiceless stop, R is an approximant or liquid, and NG is a nasal stop. (Those are technical terms, and you'll have to start getting used to them if you're going to make languages, so try not to be afraid and just go with me here, all right?)

I went ahead and changed up the following boxes to try and make use of these distinctions. I tried to make the category names have some relation to their contents. See if you can follow along:

Categories:
Rewrite rules:
Word-initial syllables:
Word-final syllables:

Noticed I added a new wrinkle in the word-initial syllables box: sSLV. Everything in GenWord is case-sensitive, so s and S aren't treated the same. S will be replaced with a letter from the S category, but s won't match any category, so the generator will just output it without changing it at all. Check out the first underlined word below for an example:

Etas eseon anwolu strideot wo tednitedasi? Tat liote elthiro an soenatusinsh ser. Tetungth setilaon otush stleadsionsh ad. Tae plaonth nedaeati nassi tosasti. Sena tlaotethata elen sethe ultoeo methea atonie. Teus streat sanat ul on lano? Itoet nette ni nidme taat doso stlemiti rebesen. Nutetshem tes rortaer le pretatet ninontoet se. Erithtanasa stlonener shotoot etudtash.

I underlined some of the new syllables we're getting from this change-up. Plaonth and strideot seem particularly English-like! We could probably do better by changing the other syllable boxes, too.

I'm going to leave off here, but there's so much more you can do. Here are a few ideas you can try on your own to test your skills:

And as a final note: always remember to have fun!

The following is a quick summary of the rewrite rules of the Kartaran Predef. It's based on Ancient Kartara, my first conlang. Many of the rules use Javascript regular expressions.

The line above looks for a string of the same monopthong vowel. If it contains three or more in a row, it replaces it with a string of only two.

The above looks for a string of two or more of the same dipthong vowels in a row, and replaces it with only one.

The above looks for a string of three or more vowels in a row and reduces it to the first two.

The above is made up of several rules about the letter h. They are designed to preserve h only when it's at the start or end of a word, or when it's a part of a penultimate syllable when the word ends with a vowel.

  1. If more than one h hapens in a row, reduce them to a single h.
  2. If an h occurs before a vowel and a final consonant at the end of the word, change it to H.
  3. If an h occurs before a vowel, followed by an ending consonant, or else followed by 0-2 consonants and a vowel at the end of the word, change it to H.
  4. If an h occurs between two vowels at the end of a word, change it to H.
  5. If an h occurs at the beginning of a word, change it to H.
  6. If an h occurs at the end of a word, change it to H.
  7. If any h is left, delete it.
  8. Change every H back into h.

The rules above change the dipthongs to their two-letter symbols.

A ĭ before an i gets reduced to just i.

A ĭ before a retroflex consonant makes it into a non-retroflex consonant.

If a monopthong is followed by an i, and they aren't at the start of a word, they get turned into a dipthong.

Any doubled consonants are reduced to one.

If a stop is followed by r, remove the stop.

If a nasal and a fricative occur together, change the second to match the first's place of articulation. If a k is followed by a nasal, remove the nasal.

If a stop and a fricative occur together, change the second to match the first's place of articulation.

Change certain difficult fricative/stop pairs into easier-to-pronounce pairs.

If retroflex and dental/alveolar occur together, keep the first one.

"Fix" the doubles we may have introduced with the nasal-or-fricative/stop changes. It's easier to just put an accent on the last retroflex in a series.

Finally, change retroflex letters into their correct symbols.