Dev Diary #5 – January-February 2019

Hey Folks

Movies without voices and ambient sounds are unthinkable – of course, this also applies to computer games. Both have been missing in The Guild 3 so far and that will change now!
In this ‘Developer Diary’ we want to give you an insight into what we have been doing lately to implement voice output and ambient sounds.

Voice Output

Voice output or voice overs – VOs – are divided into off-voices and on-voices or character-voices. If you hear a character speak because you clicked him/her or the character is complimenting someone, that’s an on- or character-voice. But if you hear a voice that does not come from a character and that tells something about an event or an incoming message, then that’s usually the off-voice – that’s why the off-voice is called the narrator voice.

There are a number of renowned studios specializing in computer game voice output. For German voice recordings we have chosen ‘4-real’ ( and for English voice recordings we have chosen ‘Pit Stop’ ( But before we could go to the studio, first the preliminary work had to be done.

The alpha and omega of voice recordings are the definition of the individual roles for the voice actors, followed by the selection of the voice actors – of course, all the texts must be final before that. In The Guild 3 we have the narrator, 3 male and 6 female voices and 1 childrens voice. A specialty of our game, however, is that each voice actors does speak for a specific role, but all voice actors must speak several roles. This is because every character can do anything or be everyone: complimenting or insulting someone, be a righteous craftsman or a sinister thief or be the chairman of the court. With the exception of the narrator and the children’s voice, all the voice actors had to speak about a dozen different roles without changing their voice too much.

Before the voice recordings, the so-called scripts for each voice actor must be created, including all the texts that he or she will speak, along with the name of the ‘take’ and clear instructions how the voice must sound. A good example is the take called ‘$Cutscenes.Reputation.Renovating’, which is used for an insult and should sound sarcastic. The text is: “I hear they’re renovating the church. Maybe you should go over and talk to the stonemasons. I’m sure they’re still looking for models for the gargoyles!”

Once everything is prepared and the voice actors are selected and booked, the recordings can start. At the time of Guild 1 and Guild 2, there was always someone from the team on site in the studio and personally accompanied the voice recordings. Nowadays, there are very good softwares that allows you to dial into a recording ‘session’ – and I’m not talking about Skype or similar tools here 🙂

A session goes something like this: The voice actor sits inside a soundproofed room in the studio, where a microphone and a monitor are located. On the monitor, he or she sees the text that needs to be spoken, along with the instruction for the emphasis. Via a headset, he or she can get instructions from the recording manager and the team member who has dialed into the session – the software prevents these instructions from being recorded. The recording manager marks or cuts the individual takes during the recording and names them as specified in the script. If a take has to be repeated because the voice actor made a mistake or the emphasis was not as desired, then the recording of the take has to be repeated. After the recordings, it usually takes a few days until the studio is finished with the editing of the individual takes. Sometimes the studio has to edit a few takes afterwards and in the worst case even re-record them, but fortunately that was not necessary for us.

We got all VOs in the last half of January and immediately started implementing them. First, we have implemented the “click comments” – every character in the game gives a spoken comment when you select him or her. Your controllable characters speak a confirmation when you give them an order. Then it was time to implement the first spoken words for conversations. We also started with the cutscenes – now the priest says something if you want to adopt an orphan or if you marry. You can also hear the narrator’s voice on some notifications. In addition, we have implemented the intros for the scenario maps… oops, don’t give away too much.

Ambient Sounds

The technical term for sounds is SFX (Sound Effect, not to be confused with Special Effect). The SFX in The Guild 3 are divided into UI effects (e.g. sound when clicking on a button), trigger effects (a house catches fire and ‘triggers’ the corresponding sound), animation SFX (a character plays the ‘walk’ animation and the animation starts the sound), character SFX (mumbling, coughing, etc. including mooing of cows and cackling of chickens 😉 ) and sound spheres (ambient sounds in the scenario maps).

The implementation of SFX in The Guild 3 basically works like this: a sound entry is created within a script. This entry has a unique ID (identification number). Now, a number of SFX sounds which belong to the same soundscape – e.g. marketplace sounds – are added to the entry. When a sound ID is to be played in the game, a sound from this list of sounds is randomly selected and played.

For character SFX and animal SFX, the designers can specify the radius around the character or animal where the sound can be heard and how often the sound is played. Rats will squeak often but you can only hear them when you’re close. Horses neigh from time to time while chickens cackle constantly. And sometimes you hear passers-by cough or mumble – we will add more here.

We have already indicated the walk animation and the sound for it. But this also includes animations that are played by objects. The campfire at the robber camp is a good example. Or the rotating blades of the windmill.

We place sound spheres in the maps with our scenario editor. If the camera is inside one of these spheres, a sound ID is played. An example is a sound sphere that enclose a forest and contains forest sounds. Sound spheres have the settings: day, night, spring, summer, autumn, winter, rain and snow. Map designers can set them in any combination to create a howling wolf sound for example, which is played at night at all four seasons, but not during the day and in rain and snow. Additionally the map designer can determine the minimum and the maximum radius of the sphere and the interval with which the sound is played. The volume is determined by the sphere itself. When you are on the edge of the sphere the sound is very quiet, while it is played loud when you are in the center of it.

This is just a rough outline of everything that is related to voice recordings, voice output and sounds. But we hope that you got a good impression. We are very much looking forward to presenting you the first results of our work on the VOs and the SFX with the upcoming patch. Of course we will continue to work on it and in the future we will add more takes and even more SFX.

All the best,

Purple Lamp Studios