Someone made a pair of wireless Walkie-Talkies using ESP32s, and so can you
#Hardware

Someone made a pair of wireless Walkie-Talkies using ESP32s, and so can you

Mobile Reporter
8 min read

ESP32 walkie-talkies use ESP-NOW over Wi-Fi for direct device-to-device audio with no radio license needed. Built on XIAO ESP32S3 Sense with onboard PDM mic, I2S amp, PTT, and thread-safe audio buffering. External antennas push range to ~200 meters outdoors, and same firmware auto-detects each board's role.

Featured image

Walkie-talkies have a certain nostalgic charm. The push-to-talk simplicity, the satisfying click of the button, the knowledge that you are communicating directly with someone without routing through some cloud server somewhere. Traditionally though, building your own meant dealing with radio frequencies, licensing requirements, and analog circuitry that could test even experienced makers. That changes with this ESP32-based project from Tech Talkies on YouTube, spotted by Hackster.io. These walkie-talkies use Wi-Fi's ESP-NOW protocol to stream audio directly between two devices, no access point or internet connection required, and no radio license needed.

What Makes This Different From Traditional Walkie-Talkies

Most walkie-talkies operate on FM radio frequencies, typically in the 462 MHz or 467 MHz bands in the United States. Consumer devices are limited to specific channels and power levels, but building your own from scratch still requires compliance with FCC regulations. That means proper licensing, approved components, and careful attention to transmit power.

The ESP32 approach sidesteps all of this. ESP-NOW is a protocol developed by Espressif that runs on top of Wi-Fi hardware but operates independently of any Wi-Fi network. Think of it as a lightweight peer-to-peer layer that lets two ESP32 devices exchange data directly using their MAC addresses. Because it uses the 2.4 GHz ISM band with very low transmit power and specific data frame formats, it falls under different regulatory considerations than purpose-built radio transmitters.

The practical result: you can build, use, and modify these devices without worrying about spectrum licenses. That is a significant barrier removed for hobbyists, educators, and anyone who wants to experiment with wireless communication without navigating regulatory paperwork.

The Hardware Stack

The project uses the Seeed Studio XIAO ESP32S3 Sense, a compact development board that packs quite a bit into its small form factor. The key specs that matter for this project:

  • Onboard PDM microphone: The board includes a built-in pulse-density modulation microphone, which means no external mic wiring for basic operation. PDM is a digital audio encoding scheme that uses a single data line and a clock signal, making it simpler to interface than I2S microphones in some configurations.
  • ESP32-S3 processor: This is Espressif's higher-end chip with dual-core Xtensa LX7 cores running at up to 240 MHz, with native support for USB OTG and more GPIO than the original ESP32.
  • Wi-Fi with ESP-NOW support: The S3 supports ESP-NOW natively, which is essential for this project's communication layer.

The audio output side uses a MAX98357A I2S amplifier, a popular Class D amplifier that takes I2S digital audio input and drives a speaker directly. I2S (Inter-IC Sound) is a serial bus interface standard for connecting digital audio devices. The MAX98357A handles the digital-to-analog conversion and amplification in one package, which keeps the wiring straightforward.

A push-to-talk button completes the input side. Hold it down to transmit, release to listen. Simple, tactile, and immediate.

The Software Architecture

Twitter image

The firmware handles several challenges that are not immediately obvious when you first think about streaming real-time audio over Wi-Fi.

ESP-NOW Audio Streaming

ESP-NOW supports a maximum payload of 250 bytes per frame. Audio at even modest quality settings generates data faster than that. The solution involves encoding audio into small chunks and sending them in rapid succession. The receiver buffers these chunks and plays them back in order.

This is not like TCP-based streaming where the network stack handles ordering and reliability. ESP-NOW is a fire-and-forget protocol at the application level. Frames can be lost, arrive out of order, or get corrupted. The firmware needs to handle all of this gracefully.

Thread-Safe Audio Ring Buffer

This is where the project demonstrates some thoughtful engineering. The ESP32 runs a dual-core processor, and the Wi-Fi stack operates in its own context. When audio data arrives via ESP-NOW, it triggers a callback function that runs in the Wi-Fi task context, not the main Arduino loop.

If you try to write directly to I2S hardware from that callback while the main loop is also reading from the same buffer, you get corruption. Audio glitches, static, or complete silence. The solution is a thread-safe ring buffer, a circular data structure that uses atomic operations or mutexes to coordinate access between the producer (the Wi-Fi callback) and the consumer (the main loop reading to I2S).

Ring buffers are well-suited for this because they have a fixed size, O(1 write) and O(1 read) operations, and naturally handle the continuous flow of real-time data. The producer writes samples, the consumer reads them, and the buffer wraps around without ever needing to shift data or reallocate memory.

Automatic Peer Identity

One of the more elegant features: you flash the same firmware to both boards, and each one figures out its own role automatically at boot time. The firmware reads each board's MAC address (a unique hardware identifier burned into the Wi-Fi chip) and uses it to assign a peer ID. Board A knows to send to Board B, and vice versa, without any manual configuration.

This eliminates the need to hardcode addresses or maintain separate firmware builds for each device. The same .ino file works for both units. For anyone who has dealt with pairing Bluetooth devices or configuring network addresses manually, this automatic detection is a welcome quality-of-life feature.

Range and Antenna Considerations

The XIAO ESP32S3 Sense includes a built-in trace antenna on the PCB. For short-range use, say across a single room or within line of sight at close distance, it works adequately. The project creator found that indoor range with the built-in antenna tops out at a few rooms before connection drops become noticeable.

Adding external antennas changes the picture significantly. The boards support external antenna connections, and with proper antennas attached, the walkie-talkies maintain a stable connection at 200 meters outdoors. That is a substantial improvement for what amounts to a few minutes of soldering and a couple of inexpensive antenna modules.

The 2.4 GHz frequency that ESP-NOW operates on has specific propagation characteristics. It penetrates walls reasonably well but attenuates more quickly than lower frequencies. Outdoor range benefits from the lack of obstacles and reflections. Indoor environments with lots of metal, concrete, and interference from other 2.4 GHz devices (microwaves, Bluetooth, other Wi-Fi networks) will always be more challenging.

For developers who want to push range further, there are options: higher-gain directional antennas, repeater nodes that forward packets between devices, or mesh networking libraries built on top of ESP-NOW that can hop messages across multiple intermediate nodes.

Why This Matters for Mobile and Embedded Developers

If you work in mobile development, embedded projects like this are worth paying attention to for several reasons.

IoT integration: Mobile apps increasingly need to communicate with hardware devices. Understanding protocols like ESP-NOW, and their trade-offs compared to Bluetooth LE or Wi-Fi Direct, gives you a more complete toolkit for building connected experiences.

Cross-platform considerations: The ESP32 ecosystem spans Arduino, ESP-IDF, MicroPython, and various RTOS options. The pattern of writing firmware that auto-detects its role and configures itself accordingly is a good practice for any hardware project where you want to minimize user configuration.

Real-time audio challenges: Streaming audio with low latency is hard regardless of the platform. The ring buffer pattern used here solves the same producer-consumer synchronization problems you encounter in mobile audio apps, game engines, and streaming software.

Cost and accessibility: The XIAO ESP32S3 Sense board costs around $13-15. The MAX98357A amplifier is under $5. Add a small speaker, a button, a battery, and a 3D-printed enclosure, and you have a functional walkie-talkie for well under $30 total. That makes this an accessible project for learning, teaching, or prototyping.

Building Your Own

The project is well-documented, and the firmware is open source. To build a pair, you will need:

  • 2x Seeed Studio XIAO ESP32S3 Sense boards
  • 2x MAX98357A I2S amplifier boards
  • 2x small 8-ohm speakers
  • 2x push-to-talk buttons
  • 2x external 2.4 GHz antennas (optional, but recommended for range)
  • Battery power solution (LiPo batteries with charging circuits work well)
  • Enclosure (3D printed or improvised)

The build process follows a standard embedded development workflow: install the Arduino IDE or PlatformIO, flash the firmware to each board, wire the amplifier and speaker to the I2S pins, connect the PTT button, and assemble.

Check out the Tech Talkies YouTube channel for the full build video and firmware walkthrough. The Hackster.io coverage includes additional context on the project's design decisions.

Limitations and Trade-offs

No project is without compromises, and it is worth understanding what ESP-NOW gives up compared to other approaches.

No internet connectivity: ESP-NOW is strictly device-to-device. If you want to route audio through a phone app or to a remote server, you need a different protocol stack.

Limited range compared to UHF radio: Commercial walkie-talkies operating on UHF frequencies can reach several kilometers with proper antennas. ESP-NOW at 2.4 GHz with modest antennas maxes out at a few hundred meters.

No encryption by default: ESP-NOW frames are not encrypted out of the box. For a walkie-talkie project this is probably fine, but for any application where privacy matters, you would need to add encryption at the application layer.

Single channel: Unlike multi-channel radio walkie-talkies, these operate on a single ESP-NOW channel. You cannot switch between channels to avoid interference.

Despite these trade-offs, the project demonstrates how capable modern microcontrollers have become. A decade ago, streaming real-time audio between two custom-built wireless devices required specialized hardware and significant RF engineering knowledge. Now, two $15 development boards and some open-source firmware get you there.

The intersection of affordable hardware, mature development tools, and creative maker communities continues to produce projects that would have been impractical or impossible for individual developers just a few years ago. This ESP32 walkie-talkie project is a solid example of that trend in action.

Comments

Loading comments...