A short tutorial: How to record speaking voices in the field

My colleagues and students often ask me how to record speech and they expect a concise set of instructions or hints. This normally happens a few hours before they leave for fieldwork or immediately after they have recorded some people and have found that something has gone wrong with the sound. In other words, most frequently it happens just too late.

Having recorded several hundreds hours of speech and making myself dozens of mistakes, I can say at least that the problem is not only technical. As a rule, it involves serious methodological issues that should be taken into consideration at a very early stage of planning your research. After a while, you will realise that this is a battlefield where your assumed methodological perfection faces technological, environmental, social and psychological limitations and obstacles.

Nowadays high quality recording equipment that meets most of linguistic and phonetic requirements is relatively cheap, easy to buy and to operate. What you get in the end of the day is mostly the question of how (wisely) you use it. Below you find a list of topics that you probably should give a thought in the phase of project design, some of them just before recordings, and some while making them, and afterwards. They include (1) defining your research goals and checking them for technical feasibility, (2) taking a closer look at the profile of people you intend to record, (3) analysing the characteristics of speech material, (4) analysing the settings (environment) and its potential adjustments, (5) legal issues; (6) designing and following the recording procedure, (7) setting up your equipment, (8) preparing for metadata collection, (9) processing and archiving.

Obviously, you can find many texts that somehow touch upon the topic of speech recording from various perspectives. This one is focused on field recording.

The technical and organisational aspects of speech recording should be part of research project design from the very beginning as they may influence or even determine your methodological approach.


Setting the goals precisely will help you to take a number of decisions in the process of data collection. How are you going to use your recordings? Are you planning phonetic analyzes? Orthographic transcription? Conversation Analysis type of transcription? Use them as samples to play back during your lecture?

However, there are some external factors that you may also take into consideration. For example, if you deal with last speakers of a given language or it seems that you won’t have any opportunity to return for more recordings for years, it may be reasonable to record more material than you need for your immediate research plans.

In brief, you should think of a reasonable balance among the factors that are important, e.g. the technical quality of recordings, the degree of spontaneity of the speakers, things they speak about and how they will speak.

While you can always assume that finding technical solutions will be a part of your project, some situations

Before you define the goals of your project, make sure what is technically possible in terms of collecting speech material.


You will deal with humans. Some of them may have a bad day. Not all of them are outcoming extroverts. You may have to try to convince them to speak to the microphone. It may take much time but sometimes it pays to have more meetings and just talk before you mention you intend to make some recordings.

No matter if you want to record realisations of isolated syllables for some in-depth acoustic-phonetic analysis or you just want to record free conversation, relaxed speakers are better speakers. Unless you want to study speech under stress. Let them know you care: as how they feel, let them tell you about themselves, tell something about yourself as wall, provide water, tee, talk for some times before you switch the recorder on.

Take care of the speakers. Adjust the recording task and procedure to their age and health conditions. Ensure that they are relaxed but not distracted. Plan pauses in recordings. Provide water or other beverages.

The acoustic quality of voice significantly changes throughout the lifespan. Elderly people may have weaker voices, less stable and less precise articulation, while children may have problems with amplitude and breath control. Changes in voice may indicate some serious medical conditions. Even though you are not a doctor, you may suggest them to get diagnosed if you suspect something.

Cognitive abilities change with age. Plan the task and write instructions adequate for speakers. Explain everything in accessible language adjusted to the age of the speakers. Children and elderly people may have problems with focusing attention for a longer time, prolonged sitting or speaking.

Sometimes people sharing a certain health condition may be in your focus as speakers. Take into account that this condition (as any external factor) may result in sharing by them some speech deficits or e.g. peculiar speaking styles.

Many people are afraid of being recorded as they afraid of being judged. Let them clearly know that you are actually interested in their “way of speaking” – just as THEY speak. It may also help if you clearly declare that the recordings will not be public and only you and your team will listen to them and analyse them in the lab. You may later ask them if they agree e.g. to publish some transcripts. But the consciousness that your speech will be heard by thousands of people may be paralysing.

Humans. Each of them brings so many uncontrolled variables! But isn’t it what we, humanists, indulge in?


What are you going to record? Try to answer the following questions:

  • How many speakers do you need (or can have – well, you may e.g. find the one and only speaker of a given language)
  • How long will be the recordings? How long will each session be?
  • Will you record monologues (e.g., narration) or polylogues (e.g. dialogues), i.e. single or multiple speakers at once?
  • What degree of spontaneity do you expect from the speakers?
  • How the people will (probably) speak? Do you expect people to whisper, scream, speak very loudly or quietly? This may be important for technical reasons.
  • How the session is going to be organised? E.g. any breaks or non-stop recording, any changes in the recording setup?

All these (and probably some more) factors may be important for the design of recording procedure and audio equipment setup. For example, with multiple speakers (recording discussions) you may want to use more microphones and even try to somehow isolate the speakers acoustically from each other. For spontaneous speech, you will be probably more cautious with sensitivity level as speakers may start to speak louder at some point which may result in distorted. Moreover, these factors may also influence the way you will process and archive your recordings. You may want to cut structured sessions into a number of audio files according to e.g. the topics you introduced as the animator during the talk or to the questions you asked the speakers. If you expect people to whisper and shout in the same session, you may consider using condenser microphones that can deal with such a dynamics of sound.


4. SETTINGS (and it’s not about the settings of your audio recorder)

There are many reasons to record people in their natural, everyday environment and often this is the only option. However, nearly always you can do some small adjustments that would significantly improve the quality of your recordings.

Inspect the room acoustically. If possible, place the microphone exactly in the place it will be located during recordings, run the recorder, set the sensitivity and monitor the signal. Listen for background noise (fridge, clock, trams and cars outside), then try to eliminate them if possible. Close the window, move the clock to another room etc.

A good room for recordings: furniture, shelves with books on the wall, heavy curtains on the windows, soft floor, relatively high ceiling.

A poor room for recordings: empty walls, few pieces of furniture

Recording in an open field may have may unexpected consequences. Some microphones are sensitive to the wind. Be sure to have a (sorry!) dead cat which is a type of microphone windscreen designed to minimize or eliminate such issues. Outdoor recordings may be full of extremely interesting but sometimes equally annoying sounds. Jets, helicopters, birds, insects, cars, trains, trees – all of them willingly contribute.

The settings may significantly influence the speaking style. In formal settings, in a lab or in a strange room, speakers may tend to be more formal.

Take an auditive look at the room: Connect headphones (and microphones if needed) to your recorder, place the microphone as for the recording, and try to listen to the room through the microphone. It may help to increase the sensitivity level for a moment. You’ll easily hear things that may be irritating for the listeners of your recordings but normally escape your attention. Is that… a refrigerator, air conditioner, an oven, who left the window open?

When recording in old houses with old electrical installation, it is highly recommended to check for the quality of electric current. When you hear a relatively low buzzing noise in the headphones attached to your recorder, try to disconnect it from the power supply and switch to batteries. If it does not help, make sure that the recorder and microphone cables are far from any power supply cables and even from the walls where electric cables may be hidden. The noise coming from home electric network may actually be caused by old or defective home appliances like a fridge, a washing machine or an air conditioner. If possible, switch them off. While you can buy power supply filters that can eliminate such noises, they are often very expensive and heavy. The cheapest solution may be to have your own batteries.



We always want to know something (sometimes a lot) about the speakers. One of the problems is that there are certain things we want to know in advance.  If you want to record speakers of Papuan Malay, you would probably better make sure in advance what language they actually speak so that you won’t record speakers of some other dialects by mistake. On the other hand, asking some other questions in advance may spoil the recording session or distress the speaker (e.g., declare how well you speak the language X, and then asking questions about this language, its grammar, and so on). Sometimes it is recommended to have an informal talk before the recordings, even a couple of days earlier if possible, to learn some basic facts, and to leave filling the questionnaire for later, after the recording session. But definitely avoid letting the speaker go without filling the questionnaire as you may never have a chance to meet him/her again. They may forget your e-mail, move to another country, or just don’t realise how important the data are to you. On-line questionnaire “to be filled later” rarely work.

Metadata often include personal data that require special caution, delicacy or secrecy. Code them as soon as possible in order to anonymize them and, even when coded, always keep the coding instructions, the original data (if you are allowed to keep them), and the coded in different locations (see the section on archiving).


Legal issues should be also taken into account from the beginning, i.e. in the design of your study. Detailed regulations vary from state to state, and local law and customs may be also of importance. In principle, you should pay attention to the following issues:

  1. Are your speakers conscious that they will be recorded? Do they accept it? Are they of legal age? Can they actually decide solely themselves on the participation in your recordings?
  2. What kind of use of these recordings they would accept? Can you make the recordings public? How, how long, where and on what conditions can you keep the recordings in an archive?
  3. If the speakers are not prone to allow for publication of their recordings, maybe they will agree to use some portions in purely scientific context (publications, conference presentations, etc.). If not, maybe they will agree that you use authorised transcripts? Maybe you can anonymise recordings for the purpose of public presentation?
  4. In some circumstances it is betted to to do the “paper work” after the recording session. However, in many situations you may need a formal, written consent before you actually start to record the speakers. You may ask them to sign the consent before the session and wait with other formalities.
  5. If you record elderly people and children who may be not fully conscious of what the consequences may be, the problem is becoming quite complex. In order to work with children you may need (signed) agreements from their parents, teachers, caretakers, and obviously from themselves. Sometimes you may need a formal agreement from institutions that act in a given country to prevent child abuse. Even when you deal with small children who may not be fully conscious of the meaning of the agreement, make sure they feel comfortable when being recorded and if not, just quit.
  6. You will have not only recordings but you will be in possession of some personal data of the speakers (see the section on metadata). You will need agreements from the speakers regarding these data. Even if you declare to code and anonymise the data, the way of doing it may vary and ensure various levels of safety or anonymity. Be ready to describe it to the speakers in simple words.
  7. You may want to archive the recordings and metadata for a prolonged period of time, maybe for future generations. Be sure to make it clear how the data should be treated (select of formulate an adequate license, make additional comments or instructions for future users). In principle, it is crucial to decide who is and who will be the owner of the data, and what are the rights of the owner.

The actual text of a legal agreement may be very complex. Be prepared to explain it to the participant or summarize in a brief and possibly clear way. Make sure that at least the main questions are formulated in a very straightforward way, e.g. “Do you agree that your voice will be recorded during the entire interview?”, “Do you agree that we keep the recordings for future studies?”, “Do you agree to make the recordings public?”

The recordings session and its course should also remain in accordance to the law and to the local customs. In principle, it should be obvious for each participant of your recordings that s/he can quit at any moment. On the other hand you may always try to convince her/him to stay. You should avoid any pressure on the speakers (unless it is a part of the scenario, and you have their agreement to behave like that). Touching them, staying too close, speaking to them too loudly or in a harsh way, even if legal, may destructive to your relation with the speakers and spoil the recordings.

You may think that the presence of parents may somehow sooth the child during the session, and put you in a safer situation as you are not the only person responsible for the child at that time. Although much depends on the age of the child and the recording situation, my experience is that the presence of parents or caretakers may actually be not favourable. Children often act to meet parents expectations, turn to them and look at them to find acceptance for their actions. If the presence of parents of a child is for some reason necessary or recommended, maybe they can stay in the room and read newspapers or books, not paying or at least pretending that they don’t pay attention to what is going on.

Let the participants know that you may make the recordings available for them so that they can listen to them and authorize them for public use. Let them know as well that these are just “research recordings”, not for the contest of beautiful, fluent, mistakeless speaking, not a radio or television interview (unless you actually record for the radio or tv).



Equipment-related aspects are important but sometimes overestimated. Nowadays you can buy a reliable hand-held digital audio recorder with a good quality internal microphone for 200-300 EUR. Most semi-professional models offer more than enough for speech recordings. Visit music stores to take a look at what is available. Among most popular, there are units from OLYMPUS, ROLAND, SONY, TASCAM, and ZOOM.

Things you may want to consider when buying a digital audio recorder:

    • uncompressed recording format (see: audio file formats)
    • high quality built-in microphones
    • inputs for external microphones equipped with phantom power (that can feed  condenser microphones)
    • Powered from replaceable batteries (you may want avoid keeping speakers waiting for a few hours before you recharge your device)
    • More expensive models may have a wide range of accessories available
    • Last but not least (!) is overall built quality. Some recorders are clearly built to last: you can easily feel they are made of high quality materials. For some, you can buy a protection cover



Voice activated recording – some voice recorders feature the system that automatically starts recording when the sound level exceeds a certain value. Rarely useful in our are of interest! Better be sure to switch it off before the session.

Limiter – sometimes useful, sometimes prohibited. When the sound amplitude reaches high levels, it is compressed: the higher the amplitude, the more damped it is, so that it never exceeds the maximum safe level and you never have overload.

Automatic recording level – may work as the limiter but also may increase recording level (sensitivity) when the incoming sound is too quiet. This features, useful as it may be, may make your recordings useless when you want to proceed with some phonetic analyses.

High pass filter – Some recorders are equipped with filters that eliminate or damp lower frequencies. This may be useful for outdoor recording and some other conditions but please remember that it will eliminate some part from the speech signal as well.

Mark up – adding markers to your recording. May be very useful. You just press a certain button whenever you hear something of your interest in what is spoken, and then you find these moments immediately in the recording. It’s just like “live tagging” session.

Sensitivity – better set it manually experimenting with voice intensity and position of the microphone.

Sampling frequency – how many measurements of signal amplitude are made in one second. The more the better but 44.1kHz is usually more than enough (some other typical values are 22.05kHz, 88.2kHz, 96kHz)

Sampling resolution – how precisely the measurement of amplitude is made and coded: how many bits/bytes are dedicated to each sample, e.g. 8bit (rather low quality), 16 (CD quality standard), 24bit or 32bit (increasingly often used in professional but also in amateur recording). With more bits per sample you can cover higher dynamics of the signal (catch silent and loud portions of sound without much problems).

Bitrate – when using compressed formats (but better avoid them), one often expresses the recording quality by defining how many bytes of information should be confessed to recording one second of the signal (e.g. 128 kbit or 192 kbit per second). Some compression algorithms are intelligent enough to detect that certain portions of the signal are just more monotonous while others are more complex, requiring more bits for coding, and reduce or increase bitrate respectively (adaptive bitrate). Of course, you can talk about the bitrate in the case of regular, non-compressed PCM recordings.


Although most portable recorders can be easily attached to a stand (a tripod, a table stand, etc.), and the first accessory you may want to buy will be probably such a stand. You can also use it for an external microphone if you decide to buy and use one. But please notice that microphones (especially condenser microphones) tend to be heavy and require not only strong but also heavy stands to keep balance. Headphones should not be considered as an option – it’s a must. Before you start recording, listen to what the microphones hear. During the recording control if there are no overloads or other technical problems. In principle, closed headphones may be often better in the field as they isolate you from the acoustic environment and allow to focus on what is coming via the microphone. But what is an advantage in many situations, in some may be just difficult to accept. Closed headphones are not recommended for prolonged use, for higher temperatures, and for situations where you must control the overall situation during the recording (e.g., when you are alone and you are not only recording but also interviewing people in the same time). Sometimes people use closed headphones but only on one ear, leaving the other open to control the environment not only via microphones.

Another useful accessory you may consider is a small acoustic screen (sometimes referred to as reflection filter or microphone isolation filter etc.). It makes sense if you have one speaker or your speakers are all on the same side of the microphone. Most acoustic screens are not comfortable in transport, you can hardly put them in your backpack. Nevertheless, you may find foldable ones. And it is not that difficult to prepare something yourself. In a noisy or very “echoic” environment you may significantly improve the quality of your recordings. On the other hand, as the screen is relatively large, it may also become an obstacle and make e.g. more difficult to see each other.

This is quite important but often neglected. Buy high quality, reliable cables from renowned manufacturers. You may be reluctant to pay 50 euro when you see a very similar one for 10 euro. When you step on a cheap cable or you pull it a bit too hard, you may end its life, and often it cannot be repaired. Please note, on the other hand, that we definitely do not need audiophile cables costing hundreds of euros. You don’t need silver, carbon nor golden cables.



Another step towards higher quality of recordings would be an external microphones. Internal microphones in portable recorders are always small and have some limitations when it comes to catching lower frequencies and sometimes lower sensitivity than full-size microphones. As they are a part of the recorder, operating the recorder often means removing the microphone from its proper place. If you have a hand-held microphone, the speaker may keep it in his/her hand while you can start or stop the recording, add tags, change sensitivity, and so on, during the session. Accordingly, the you are much more flexible when it comes to finding a place for an external mike.

The quality of the microphone (often referred to just as “mike”) is in most cases more important than the quality of the recorder itself as even basic recorders are often good enough for linguistic applications. The microphone may influence the recording to a much higher degree.

  1. If you want to record people in a room, a condenser microphone may be a good choice. It is delicate, requires so called Phantom Power, but it is also very sensitive and has a very wide frequency response. It is often more sensitive to lower frequencies than other types. Condenser mikes are very sensitive to pressure changes. If you are closing the door or opening a window during the recording, it will be surprisingly noticeable in your recording. They are also sensitive to shaking and strokes. They require shockmounts to isolate them from what can be transmitted by the microphone stand from the surface on which it is placed on. Large-membrane condenser microphones require solid, heavy stands, and are not designed to keep them in hands. However, you can also buy a small-membrane condenser mike that looks just like the dynamic microphone while still working on the same principle as the condenser one.
  2. If you want to record in the real field, condenser microphone may be too sensitive; then you may choose a dynamic microphone. Typically used as vocal stage microphones, they are usually rugged and sturdy, often solid like tanks. They do not need external power supply, but are less sensitive and have a narrower band of efficiently transduced frequencies (typically 150Hz – 15kHz vs. 20Hz – 20kHz typical of large membrane condenser mikes). This may be a significant difference only for certain instrumental phonetic analyses.
  3. For some applications, lavalier microphones, attached to the clothings, are highly recommended. Many of them are electret microphones that work on a similar principle as the condenser ones but do not need Phantom Power. They are small and may not be perfect in catching the lower end of the spectrum. But as they stay in a relatively same distance from the speaker’s mouth and relatively far from other speakers, they may be especially good in simultaneous recording of multiple speakers.
  4. Head-on microphones can be also condenser or dynamic. While even better than lavaliers as they are at a fixed distance from speaker’s mouth and do not touch clothing, they may be somewhat difficult to use. Few speakers can behave and speak spontaneously and relax with something attached to their heads and something just at their mouths (these mikes are not places “in front of” but rather at a side of the mouth).

Popscreen can be useful to slightly disperse the energy of plosions produced by speakers. It also helps to keep the microphone clean by protecting it from drops of saliva.

Dead cat. This horrifying name refers to something that looks like a piece of fur that you put on the mike to limit the noises coming from the wind. When watching news, you may see that the microphone held by the reporter is sometimes covered with foam or fur-like material.

Condenser or dynamic? When you work in the (real) field, in difficult conditions, outside, in low or high temperatures, etc., a regular dynamic microphone should be your choice. It is lighter and more robust. For recordings at homes, libraries, and other relatively quiet internals, a condenser one may be a better choice.

Where to place the microphones? In a studio, you just place the microphone in a recommended distance from the mount of the speaker, usually almost straight in front of him or her (some exceptions are known). The distance of ca. 30 cm is adequate in most situations. For single speakers and experimental recordings, you may want to shorten this distance.  But in the field, when recording free speech, spontaneous conversations, emotional monologues, you can hardly tell the speakers to remain in the fixed distance from the microphone during the entire session. This is one of the reasons to use more microphones that would be able to capture what speaker says even if s/he turns left of right. You may also consider using lavalier microphones, attached in most cases to clothing of your speakers. But it may happen that there won’t be anything that you can attach the microphone to or, in opposite, the clothes will be just to rich and make noise with each small movement of the speaker.

One reason for using multiple microphones is to catch sound coming from various directions. But in some cases you may want to have speakers recorded on separate tracks, acoustically isolated from each other. However, perfect isolation is extremely difficult to achieve if you don’t close each of them in a separate acoustically isolated room. And this is not a natural situation for conversation anymore! Use lavalier or head-on microphones and try reduce their sensitivity, sit the speakers possibly far from each other, place some acoustic obstacles between them (e.g., if they sit at the table, you may place some piles of books or magazines on it). Flowers will be good as long as they don’t reduce mutual visibility (unless this is what you want to achieve). If you use regular, tripod-based microphones, use acoustic screens and select more directional characteristics (if such an option is available).

Design your recording setup. Make a checkup list and use it always for packing your things before you leave for recordings. Spare batteries, charger or power supply, (tested and formatted, if necessary) memory cards, cables.


Type – dynamic, condenser, and electret are three major types of our interest

Sensitivity – how effective is the microphone in transforming

Self noise – powered microphones tend to produce noise of their own; usually it is at a very low level, but please notice that some cult microphones used in music recordings are not that quiet at all as it is not of primary importance: Singer will be significantly louder than the noise level. But when it comes to speaking, things may look different.

Frequency range: the range of frequencies that are captured by the microphone efficiently enough (precise definition used by manufacturers may vary), e.g. from 150Hz to 15kHz (typical to dynamic microphones), or from 20Hz to 20kHz (to be expected in good condenser microphones) (What is Hz?)

Dynamic range – the range of signal amplitudes that can be handled by the microphones (from quiet to loud sounds), e.g. 120dB SPL (What is dB SPL?)

Directionality / polar characteristics – it is about the directions the microphone picks up the sound more efficiently. Cardioid is the most popular directional characteristics. But there are also ominidirectional microphones that are equally sensitive to sound from all directions as well as unidirectional microphones that are focused on sounds coming from one particular direction.


A well-defined procedure is a key to your success. Provided you diligently follow it.

We are always tempted to take shortcuts and skip some of the steps (“obviously, there is enough space on the memory card”, “I checked battery level ten minutes ago, it must be okay”, “people were asked once to switch their phones off, why bother them and ask again!?”). Believe or not, but there are good chances that you won’t save but lose a lot of time in this way.

Instruction should involve all the steps, including the way of dealing with the speakers. It is especially important if you intend to obtain comparable recordings (e.g., different speakers speak on the same topic at the same environment, and you want to measure their emotional reaction). It is also highly recommended to have a procedure for preparing for field recordings, i.e. packing your things. Believe or not, but it is not that difficult to forget about memory cards or spare batteries. And if you have a lot of equipment, arranging it always in the same way in your bags makes setting it up on location much faster.

“But how should I speak?”
In most cases you can anticipate this question and include a hint in the instruction for the speaker. Depending on your expectations, this may be something like “speak as you normally speak”, “we don’t expect any specific way of speaking, just speak as you like”, “try to speak clearly and slowly but without exaggeration”, “speak as fast as you can”, “speak in a convincing way”. For some kinds of recordings, speakers and situations, a hint on how to speak may be obligatory.  Hints as those above may include suggestions directly applying to speech but also to the emotional layer, behaviour, etc. (“Speak in a soothing, nice way.”). Depending on your goals, some kinds of instructions may be not allowed at all. If you want, for example, to study spontaneous expression of emotions in speech, you shouldn’t tell your speakers “spontaneously express your emotions in the way you speak”.

When recording spontaneous conversations or narration, you may want to skip instructions on how to speak but you should always be prepared for questions from the speakers.

People are often uneasy and think that the are too close or too far from the microphone. You should make them feel comfortable but you may recommend to keep a certain distance from the mike.

Even if you make recordings of spontaneous speech in a sort of natural environment (like a pub, which is of course, not the best idea), there are still some steps that must be taken. Depending on the conditions, the sequence and its content may differ. When planning your procedure, you may take the following steps into consideration:

  1. Check the settings (where to sit, where to place microphone, what is the acoustic situation, etc.)
  2. Install, run and test the equipment
  3. Invite and greet speakers (you may want to talk with them just to get better contact and relax the atmosphere
  4. Inform the speakers on the nature of the session and the way the recordings will be used
  5. Instruct the speakers 
  6. Collect the metadata (e.g. questionnaire filling immediately after the session)
  7. Express you gratitude to the speakers and make them sure they did very well. All our speakers are…. best speakers
  8. Pack your equipment 


Some things should be checked before you start, some may be worth checking before each part of the session.  Design your own list if you need more specific instructions.

  1. Check and diagnose the acoustics of the room / recording space.
  2. Remove potential major sources of noise and think of potential acoustic dangers (e.g. trains, planes, loud neighbours)
  3. Check if all the cables properly connected and safe (so that people won’t step on them and won’t catch them when walking by)
  4. Check if the power source is ready and connected (batteries or power supply unit; are batteries loaded?
  5. Does the memory card / memory of the recorded offer enough space for recordings?
  6. Are the microphones safely placed in the most adequate positions? (not too far from speakers, on a stable surface, etc.)



Before any processing takes place, make your data safe. Even before placing the memory card taken out of your audio recorder in the socket of the computer check whether the card is protected (often done by a small slider).

A side-but-still-IMPORTANT comment on audio processing:

You cannot get from the sound file more than there is already in it. It is digital. There is a fixed number of bytes and… nothing more. For linguists, magical “audio cleaning” or “recovering” software is of a very limited use and usually limited to very old recordings as these techniques often involve adding to signals things that haven’t been there or removing things that may prove not to be just redundant. It is true that using a number of techniques you may improve the perceived quality of recordings and help those who are to transcribe them, and if necessary, just go for it. But if you go too far, you may change the original features of voices. And the resulting signal may prove useless for phonetic analyses. Sound restoration practises are common in the music business and people (at least many of them) like to listen to “improved” (remastered, remixed, re-…) versions of The Beatles or The Rolling Stones. In the realm of research, this kind of approach is, with a few exceptions, useless or highly doubtful.

First of all, preserve your original recordings and metadata, preferably in at least two copies that have the quality of the original and are located in physically distant places (i.e. not on two external HD drives that you keep on your desk). Do it in a way that in case you decide to spend the rest of your life in Goa and not to contact any of your colleagues, linguists will be able to determine what is the content of the disc. A very brief description, a “label”, in a simple txt file in the main folder may be enough. You should mention

  • time and place of the recordings
  • participants (speakers – how many, who – in terms of gender, nationality, or other relevant features)
  • authors (who collected the recordings – you, your collaborators) and contact – if possible
  • where is what (e.g. original signals are in the folder “ORG”, processed signals are in the folder “PROC”, metadata are in “META”)

Are the speakers really anonymous?
You are often obliged to anonymize your metadata, i.e. eliminate all the information that may directly help to identify the speaker. In fact each piece of information helps to do it. Male or female? Age? Nationality? And finally, voice itself is not anonymous as it contains the data that can help to identify the speaker. 

Sometimes you want to keep names and address data of the speakers for further contact. Even if you are allowed to keep these data, neven keep them together with a coding table that translates the names into the codes (Peter Newman -> PENE_M, Penelope Newman -> PENE_F; with more “PENE” speakers you may think about an additional indexing, e.g. PENE_01_F).

The way of coding presented above as an example is not very safe in fact. You may want to create a more elaborate system, use only numbers or alphanumeric symbols in a less obvious way. With many files, it may be a good idea to write a macro or a script that would automatically code file names.

Even if your recordings will be described in detail, put in a database system, keeping the original material is quite important. WAV files will be readable for many years while databases and corpus-management systems may evolve, become very expensive or just difficult to run, and when all fails, you still should can access the source material.

If you do not use any particular database or corpus-management system, you may take a number of approaches to archiving your data. What you should take into consideration is:

  • How to cut (if at all) your audio files? Which units will be most convenient for further use? Phrases? Words?
  • How to describe each unit or set of units? What should be included? (e.g.
  • How to link the description with the signal (same folder, corresponding file names, etc.)


More information about fieldwork recording can be found here

Section author: Maciej Karpiński maciej.karpinski@amu.edu.pl