Personally vetted instructors
Neutral American English tutors, lessons & classes
Welcome. Where the network voiceover actually opens the session, three notes lower than the room.
Personally vetted coaches for Neutral American English, the broadcast-standard register used in voiceover, narration, presentation, and professional contexts where the audience hears no regional markers and no foreign substrate.
Your instructors
Neutral American English tutors for private lessons & classes
Strommen has coached Neutral American English work since 2006, starting with broadcast and voiceover professionals and expanding into executives, presentation coaching, and fluent non-native English speakers preparing for high-stakes professional contexts. Our roster includes working voice talent, broadcast-trained coaches, credentialed speech-language pathologists, certified accent reduction specialists, and presentation-coaching specialists with executive client lists. Every tutor below was met and vetted by us in person or via thorough video interview. No marketplace. No automated profiles.
Filter by location, age, or price. Then book a 30-minute free trial.
Below are the Strommen tutors who specialize in Neutral American English coaching. Photos, ratings, and rates are real. Click any card to read their bio and book a free 30-minute trial.
Broadcast — voice & register
5 features that define Neutral American English
Five working features every effective Neutral American coach drills first. Screenshot for the next demo reel or executive-prep conversation.
-
01
The rhotic R, held under pressure
Neutral American English pronounces the R after every vowel: car, water, here, father all keep their R audibly r-colored. The harder skill is holding the rhotic R consistently through long persuasive passages, emotional narration, fast spontaneous speech, and the fourth page of an unbroken voiceover read. British-substrate speakers, traditional Boston and New York speakers, and Australian-substrate speakers drop the R first under that pressure; the coaching is built around the durability of the rhotic, not just the production of it.
e.g. Hold the R through: <em>The water is over there, and the car is parked further down.</em>
-
02
The schwa, on every unstressed syllable
The schwa is the most-used vowel in spoken Neutral American English, the relaxed uh that appears in unstressed syllables of high-frequency words: about, banana, problem, sofa, possible, support, official, photograph. Native Neutral American speakers hit dozens of schwas per minute. The single largest non-native and non-Neutral marker is over-pronouncing those unstressed vowels to their full value, which makes the cadence audibly foreign or regional even when no other sound is wrong.
e.g. Neutral: <em>I have a problem with the sofa.</em> Over-pronounced: <em>I have AH problem with the SO-FAH.</em>
-
03
The flap-T, mid-word and mid-phrase
Neutral American turns the T between vowels into a fast voiced tap, almost a D: butter as buh-der, water as wah-der, better as beh-der, thirty as thir-dee. The flap also crosses word boundaries: get up becomes geh-dup, not at all becomes nah-da-dall. A clean British or Australian crisp T in those words places the speaker outside the Neutral register instantly. Mid-Atlantic is the deliberate exception and is not what voiceover or executive Neutral American is asking for.
e.g. Voiceover Neutral: <em>Get a bottle of water.</em> Off-register crisp T: <em>Get a bot-tle of wa-ter.</em>
-
04
Stress-timed rhythm, the deepest single feature
Neutral American is stress-timed: stressed syllables fall at roughly equal intervals, and unstressed syllables compress and reduce to schwa to fit the rhythm. Spanish, Italian, French, Mandarin, Japanese, Korean (among others) are syllable-timed: every syllable gets roughly equal duration. The rhythmic reshape from syllable-timed to stress-timed is the deepest single change in Neutral American training and the one that gives the trained voice its characteristic music. It takes the longest and pays the most.
e.g. Syllable-timed: <em>I-am-go-ing-to-the-stu-di-o.</em> Stress-timed Neutral: <em>I'm GO-na the STU-dee-o.</em>
-
05
No regional markers, no generational markers, no substrate
The brief is concrete: the audience hears American, identifies no region inside the United States, identifies no generational code, and identifies no foreign substrate. Coaches drill the absence of regional features (Southern, New York, Boston, Chicago, California) and the absence of generational features (uptalk, creaky voice, aggressive vocal fry on every clause-final syllable) alongside the presence of the rhotic R, schwa, flap-T, dark-L, and stress timing. The deliverable is the voice that the audience trusts and stops trying to place.
e.g. A Neutral American read: no audible postmark from any region, generation, or first language.
About Neutral American English
The American voice with no postmark
Neutral American English is the register a listener hears as American and cannot place inside the country. No Southern drawl. No New York raised vowels. No Boston non-rhotic R, no Chicago Inland North shift, no California uptalk, no Midwest cot-caught merger pulled to one extreme, no AAVE features, no Chicano English markers. No audible foreign substrate either: no Spanish syllable-timing pulling against the rhythm, no Mandarin tonal contours under the statements, no Russian sentence-final fall, no Indian-English stress placement. The register is the broadcast standard that grew out of the postwar Midwest into the national-media voice and now anchors voiceover, narration, e-learning, audiobook, IVR, corporate video, presentation coaching, and any context where the audience needs to hear an authoritative American speaker without forming a regional or cultural assumption about who is speaking. Casting calls describe it variously as General American, Standard American, Broadcast Neutral, Network Voice, or Mid-American. The deliverable is the same.
Most students who book this specialty come from one of three contexts. The first is voice talent, professional or aspiring: voiceover actors, narrators, audiobook readers, e-learning narrators, podcast hosts who want to widen their commercial range, IVR voices building demo reels, and broadcast professionals (television and radio) who need the register on tap for on-air work. The second is corporate, executive, and presentation context: senior leaders giving keynotes, conference panel moderators, internal-communications presenters, sales-leadership voices preparing for customer-facing video, technology executives prepping for investor calls, and lawyers and physicians preparing for testimony, expert-witness work, or patient-facing video. The third is fluent non-native English speakers whose existing accent is professional and confident but who want a code-switchable Neutral American register available for specific high-stakes contexts: an investor pitch, a keynote, a public hearing, an academic talk, a press appearance. None of these students are asking to erase their existing voice. They are asking for a second register to be available when the work calls for it.
The phonetic profile of Neutral American English is concrete and trainable. The rhotic R is fully pronounced after every vowel: car, water, here, father all keep their R audibly r-colored, and the R is held consistently through long emotional, persuasive, or narrative passages where a non-rhotic substrate would re-assert itself. The schwa carries the unstressed syllables: about, banana, problem, sofa, support, possible all reduce their unstressed vowels to a relaxed uh, at a rate of dozens of schwas per minute of speech. The flap-T turns intervocalic Ts into fast voiced taps: butter, better, water, thirty all use the flap. The cot-caught merger sits on the merged side: cot and caught, Don and dawn, stock and stalk are heard as the same vowel, the contemporary American mainstream. The diphthongs land cleanly: house, boy, buy, day, go, bear all glide rather than break into two pure vowels. The dark-L appears at the end of syllables: bell, milk, full, cool. The TH sounds (voiced and voiceless) are held distinct from S, Z, T, D, and F substitutions. And the prosody is stress-timed: stressed syllables fall at roughly equal intervals, unstressed syllables compress to fit, and the resulting cadence is the one a listener identifies as American without consciously analyzing why.
What the register pointedly avoids is also part of the brief. No regional markers, which means no Southern monophthongization of the long-I, no New York raised AW vowel, no Boston non-rhotic R, no Pittsburgh monophthongized OW, no Inland North vowel shift, no California Shift, no Pacific Northwest pre-velar raising, no Upper Midwest monophthongization. No generational markers either: no uptalk at the end of statements, no creaky voice on every clause-final syllable, no aggressive vocal fry, no high-rising terminal on declaratives. No code markers from any regional or ethnic variety. And no audible substrate from any first language: not the Spanish dental T, not the Mandarin lack of final stops, not the German devoiced final consonants, not the Russian palatalized consonants, not the Indian-English retroflex T and D, not the French uvular R, not the Japanese vowel insertion in consonant clusters. The trained Neutral American voice is the voice the audience will, on first hearing, place inside the United States and stop trying to place further.
The coaching process has a defined shape. Session one is a recorded diagnostic: a reading passage (often the Rainbow Passage or the Stella passage), recorded spontaneous speech (a two-minute self-introduction, a description of your work, an off-the-cuff response to a question), and a recorded conversation segment. The coach listens with you, marks the specific deviations from Neutral American (regional features, generational features, substrate features), shows you the IPA transcription of what you said versus what a Neutral American speaker would produce, and proposes the priority list of features to drill. Sessions two through six work the highest-impact targets: usually some combination of the schwa, the rhotic R consistency, vowel-length contrasts, stress placement, and rhythm reshape, depending on where the student is starting. Sessions six onward integrate the work into scripted reads (for voice talent students), into prepared presentation material (for executive students), and into spontaneous speech under recording (for everyone). Recording every session, every week, is the single most important practice. The student's own ear sharpens alongside their production, and recording captures both.
A few honest tutor observations on where students typically stall in Neutral American work. The schwa is the most-missed single feature, and it accounts for more substrate detection than any other single phoneme; non-native speakers and some regional American speakers over-pronounce unstressed vowels and the audience hears the cadence as foreign or regional even when the consonants land. The rhotic R is the next, especially for British-substrate speakers and for traditional Boston, New York, and Eastern New England speakers; the R drops first under emotional or persuasive pressure. The flap-T is fast to drill but slow to internalize; many otherwise fluent speakers keep crisp Ts in butter and better well into their training. Stress placement on words like PHOtograph versus phoTOGraph, PROduce the noun versus proDUCE the verb, REcord the noun versus reCORD the verb is a category many adult learners and some American regional speakers carry quietly wrong, and it is the largest single source of comprehension friction on professional calls. And the rhythm reshape from syllable-timed to stress-timed is the deepest single change in the work, especially for speakers whose first language (Spanish, Italian, French, Mandarin, Japanese, Korean) is syllable-timed; it takes the longest and pays the most.
The time horizon is honest. Voice talent students preparing demo reels typically reach a usable Neutral American register within two to four months of focused weekly work plus daily home practice, with continued refinement over the following year. Executive students preparing for a single high-stakes event (keynote, investor pitch, public hearing) typically reach event-ready Neutral American in six to twelve weeks of focused pre-event work. Fluent non-native speakers building Neutral American as a second register for code-switching typically reach reliable code-switching control within six to twelve months. Pass-as-native is possible for many adult students with sustained work, but it is not the brief most professionals need; reliable code-switching into the register on demand is the brief, and that is reachable on the timelines above. Your existing voice does not disappear; the Neutral American register becomes one more voice you can call up.
Between sessions, immersion is the multiplier. For broadcast-neutral General American, NPR and the major-network newscasters are the working reference standard. For long-form narrative narration, audiobooks read by working voice talent are the closest thing to the deliverable most students are preparing for; the Audible catalog is a reference library. For corporate and presentation cadence, podcasts like Marketplace, Hidden Brain, and How I Built This carry the register. For voiceover commercial work specifically, watching national TV commercials with a critical ear and shadowing the voice talent line by line is one of the most effective home practices in the trade. The blog's guide to American accents covers the regional landscape this register sits inside, and our American Accent for actors page covers the script-led on-camera work for actors taking on American roles, our American Accent Training page covers the diagnostic IPA-grounded work for fluent non-native English speakers, and our Accent Modification page covers the speech-pathology context.
Lessons are one-on-one and calibrated to the deliverable. A voice talent student preparing a commercial demo reel runs a different curriculum than an executive preparing a six-minute keynote, which runs a different curriculum than a non-native physician preparing for patient-facing video in a US hospital. The trial is free, the coach runs the diagnostic with you, and the curriculum comes out of that. Bring the script, the demo prompt, or the presentation deck if you have one. The full Strommen tutor directory shows the wider roster if you want to compare before booking.
What you'll cover
Lessons & classes tailored to Neutral American English
Diagnostic and IPA-grounded baseline
First session diagnostic with recorded reading, recorded spontaneous speech, and recorded conversation. IPA-marked breakdown of the specific deviations from Neutral American present in your current voice: regional features carried from your home region, generational features carried from your peer group, substrate features carried from your first language. The coach proposes the priority list of features to drill, and the curriculum is built around it.
Phoneme and prosody work
Targeted drill on the highest-leverage features: the rhotic R consistency, the schwa across unstressed syllables, the flap-T mid-word and mid-phrase, the dark-L in syllable codas, the diphthong glides, the TH distinctions, the vowel-length contrasts. Stress-placement work on the words where adult learners and some regional American speakers carry quietly wrong stress. Rhythm reshape from syllable-timed to stress-timed for students whose first language carries syllable-timed cadence.
Voiceover, narration, and demo reel preparation
For voice talent students: commercial copy reads, e-learning narration, audiobook narration, IVR scripts, corporate video voiceover, podcast hosting. Coaching builds the trained Neutral American register into recorded reads at performance pace, addresses booth posture and breath, calibrates the read for the medium (45-second commercial versus 8-hour audiobook), and prepares submission-ready demo reel material. Pricing reflects experience and the coach's credit list in the trade.
Executive, presentation, and code-switch coaching
For executives, attorneys, physicians, and fluent non-native English speakers: presentation prep for keynotes, investor pitches, panel moderation, public hearings, expert-witness testimony, patient-facing video, internal communications. Coaching builds Neutral American as a code-switchable second register available on demand without erasing the student's existing voice. The deliverable is reliable register control under high-stakes conditions, not a permanent identity change.
FAQ
About Neutral American English lessons & classes
What is Neutral American English, exactly?
Neutral American English is the broadcast-standard American register that listeners place inside the United States and cannot place further: no Southern drawl, no New York raised vowels, no Boston non-rhotic R, no California Shift, no AAVE features, no Chicano English markers, no audible foreign substrate. Casting calls and corporate briefs describe it as General American, Standard American, Broadcast Neutral, Network Voice, or Mid-American. It is the register voiceover, narration, e-learning, audiobook, IVR, corporate video, and executive presentation work asks for when the brief is regionally and culturally unmarked.
Is Neutral American the same as the American accent for actors?
Closely related but a different brief. American Accent coaching for actors is script-led and serves the part: a Southern character, a Brooklyn character, a Mid-Atlantic character, a General American character. Neutral American English coaching for voice talent, executives, and non-native English speakers is register-led and serves the deliverable: a voiceover read, a keynote, an investor pitch, a patient-facing video. The phonetic toolkit overlaps almost entirely (rhotic R, schwa, flap-T, dark-L, diphthong glides, stress timing). The application differs.
Will my regional American accent disappear?
Not unless you want it to. The standard brief is to build Neutral American as a second register you can call up on demand, not to overwrite your home voice. Most students preserve full code-switching control: their Sunday-dinner voice stays exactly what it has always been, and the Neutral American register comes online for the contexts that ask for it. A small number of students do choose to migrate fully, usually for sustained on-air or voiceover careers, but that is a chosen outcome rather than a default one.
I am a fluent non-native English speaker. How is this different from accent training?
Closely related and overlapping in the toolkit. The Strommen American Accent Training page covers the diagnostic IPA-grounded work for fluent non-native English speakers more broadly, with a focus on professional clarity and code-switching ability. The Neutral American English page is the same toolkit applied to a specific deliverable: a broadcast-grade register for voiceover, narration, presentation, or executive contexts. Many non-native English students start on the Accent Training page and move to Neutral American work when a specific high-stakes event calls for it.
I am a working voiceover or audiobook talent. Will you help me build a demo reel?
Yes. Demo reel preparation is one of the most common briefs on this specialty. The coach works through copy at performance pace, calibrates the read for the medium (45-second commercial versus 8-hour audiobook versus IVR menu versus corporate explainer), addresses booth technique, and prepares submission-ready material. Several Strommen coaches are themselves working voice talent with current commercial, network, and audiobook credits, and bring the inside view of the trade.
Can you prep me for a single high-stakes event in six to twelve weeks?
Yes, with scope defined honestly. A six-to-twelve-week sprint to event-ready Neutral American on a specific prepared script (keynote, investor pitch, public hearing, expert-witness testimony, patient-facing video) is realistic for most fluent speakers, and is a common engagement on this specialty. A longer arc applies if the deliverable is reliable code-switching across all contexts rather than performance on a single prepared piece. Tell the coach the date and the deliverable at the trial.
Do you offer on-set or in-studio coaching during production or recording?
Yes. For voiceover sessions and audiobook recording, on-Zoom direction during the session is common for voice talent students, and several coaches are available for in-studio direction at major LA voiceover studios. For executive presentation work, on-Zoom and in-person dry-run coaching close to event day is common practice. Rates and availability for production-day work are scoped per project at the trial.
What does the trial include?
30 minutes, free, with the coach you select. Bring the script, the demo prompt, the presentation deck, or the brief, whichever applies. The coach runs the recorded diagnostic, marks the highest-impact features to drill, and proposes a study plan calibrated to your event date or your professional timeline. Most students continue with their trial coach; swapping is easy and quick if the fit is not right.
Ready for Neutral American English lessons or classes?
Book a free 30-minute trial with one of our personally vetted tutors. Private lessons or small-group classes — your choice.