You might not need a sync server for real-time collaboration

Lessons from building multiplayer games.

8 minute read

Real-time collaboration software is notoriously difficult. You’d think understanding the elegantly written Figma multiplayer blogpost would give you a sufficient mental model to roll your own collaborative layer, but that’s only the beginning. You still have to figure out your data model and conflict resolution strategy specific to your application and manage the infrastructure (not to mention the cost).

That’s why collaboration engine products exist and are expensive. For many collaborative scenarios, that approach might be overkill, especially for low-throughput use cases with a small number of concurrent users. You might be able to sidestep most of the complexity by getting rid of the sync server entirely and having the peers talk directly to each other. That’s what I did in my latest project, and this blogpost walks through the approach.

funcalling.com is a platform where you can play board games and motion-controlled games over video calls. Video calls today are passive. You sit, you talk, maybe you share your screen. I wanted something more interactive that recreates the fun of Xbox Kinect party games but works over a simple video call. MediaPipe made client-side pose detection surprisingly fast, even on mobile. The challenge was figuring out how to keep the game states in sync between the two players.
Sword dueling with my friend!

Here’s my friend and me playing a sword fighting game over video call, using just our fingers as weapons. There’s no sync server. Our browsers are talking directly to each other, syncing health bars and strike animations in real-time.

Peer-to-Peer Architecture with Host Authority

I settled on a peer-to-peer architecture. I still had to host a lightweight signaling server to help the peers find each other, but once connected, the game state is kept by the peers, without an authoritative server to sync the states with. This kept things simple. Without a middleman, all game code (logic and graphics) is in one place and I could prototype new games quickly.

But without an authoritative server, how do the two peers agree on the game state? I decided to adapt the authoritative server approach by making one of the peers the authoritative host. Since the P2P signaling process (perfect negotiation) already involves assigning one of the peers as impolite and the other one as polite, we’ll just assign the impolite side as the authoritative host.

On a high level: when the non-host wants to make changes, it sends an action to the host. The host applies the action using game logic, and if the state changes, propagates the new state to the non-host. When the host wants to make changes, it sends an action to itself and follows the same process.

Non-HostHostactionstate
Non-host sends action; host applies action to its own state and responds with updated state.
Non-HostHostactionstate
Host applies action to its own state directly; then propagates state to non-host.

Since the games that I envisioned for this project involved simple game states, the states could just be propagated in full. In competitive games, you might hide opponent positions to prevent cheating, but for casual games, this was not necessary. Sending the full state gives us another nice property: even if the host drops the call and rejoins later, we can still recover the last known state using the state on the non-host’s side. When the dropped peer rejoins, both peers exchange their full state. Whichever has the higher ID (incremented on every state change) wins.

This state-sync protocol worked well for turn-based games like Tic Tac Toe and Four in a Row, but it was clunky for real-time interactions.

The main problem was that some things need to happen once, right when they occur. For example, in Sword Duel — a finger-tracked sword fighting game where you battle your opponent by tilting your hand — when you land a strike, both players need to see the hit animation at that moment. With states alone, there’s not a simple way to distinguish “opponent struck you just now” from “opponent struck you previously and no new state has arrived since.” A naive approach might be to clone the state locally so when a new state comes, you can do a deep comparison to see what parts of the state changed and decide to render animations when relevant parts of the state changed, but this approach can get complicated quickly.

So on top of states, I added events, which are ephemeral messages that don’t require consensus.

Non-HostHosteventevent
Either peer can send events directly to the other.

Any side can just send an event message to the other peer. Events allowed me to prototype the collaborative piano experience.

Playing a piano duet in real-time with my friend!

When you play notes, you’re sending notePress and noteRelease events. On press, you trigger the note using Web Audio API, and then clean it up on release.

Note that RTCDataChannel guarantees ordered delivery by default, regardless of whether the WebRTC connection uses UDP or TCP. So there’s no need to worry about out-of-order events.

After I prototyped the turn-based games and the event-based piano, I had the tools to build more sophisticated games that used a combination of states and events.

Examples of states and events working together

Sword Duel

The Sword Duel game needed both primitives working together.

The state:

interface SwordDuelState {
	health: Record<"host" | "nonHost", number>;
	playerStates: Record<"host" | "nonHost", "sword" | "shield" | "switching" | "stunned">;
	winner: "host" | "nonHost" | null;
	status: "countdown" | "playing" | "finished";
}

These are facts that both players must agree on. When you land a hit, you send an action:

sessionStateManager.sendAction({
	kind: "game",
	type: "swordduel",
	gameAction: { kind: "hit", player: localPlayer },
});

The host validates this and broadcasts the new health state, but actions alone are not enough.

To make the game feel more lively, we also use events to stream your sword angle (determined by the angle of your camera-tracked finger) in real-time.

sessionStateManager.sendGameEvent({
	gameType: "swordduel",
	name: "swordAngle",
	data: { angle: currentAngle },
});

We also use events for visual events that both players should see instantly:

EventPurpose
slashSlash animation
blockedBlue shield bloom effect and stun stars animation
damagedRed hit effect + screen shake
switchWeaponWeapon switching animation

Actions change the points while events make it look like a fight. When you slash and your opponent blocks, the blocked event triggers the visual effect on both screens instantly. Then the action updates the attacker to “stunned.”

Draw Together

Draw Together is a collaborative canvas, like a tiny multiplayer whiteboard. To see your friend drawing strokes in real-time, while also keeping a consistent state of all the strokes, I used both events and states.

The state:

interface DrawTogetherState {
	mode: "together";
	strokes: Stroke[];  // Completed strokes
	hostRedoStack: Stroke[];
	nonHostRedoStack: Stroke[];
}

While you’re drawing, you stream the in-progress stroke as an event:

// Sent on every pointer move while drawing
sessionStateManager.sendGameEvent({
	gameType: "drawduel",
	name: "strokeUpdate",
	data: {
	points: [...myCurrentStroke.points, newPoint],
	color: currentColor,
	width: currentWidth,
	},
});

When you finish a stroke (lift your finger), it becomes permanent by sending the action:

sessionStateManager.sendAction({
	kind: "game",
	type: "drawduel",
	gameAction: {
	kind: "addStroke",
	points: myCurrentStroke.points,
	color: myCurrentStroke.color,
	width: myCurrentStroke.width,
	player: myRole,
	},
});

The host adds it to the strokes array, and everyone has the same canvas.

This lets you see your friend’s stroke forming in real-time, before it’s committed to state.

We also stream cursor presence (inspired by Figma’s cursors):

sessionStateManager.sendGameEvent({
	gameType: "drawduel",
	name: "presence",
	data: { x: point.x, y: point.y },
});

So you can see where your friend is hovering even when they’re not drawing.

Once I had a better mental model of states and events, I gave detailed instructions to Claude to prototype new games and activities. I prototyped the Word Duel game, which used both states and events, in just a day with Claude Opus 4.5, and then cleaned up the rough edges the following day.

Limitations

One shortcoming of the states/events system is that there’s no anti-cheat. During a game, you can send any action. As long as it’s valid according to the game logic, it gets accepted, even if you didn’t actually perform the move. In Sword Duel, you could spam the strike action until your opponent loses. Or, if you joined first, you could tamper with your local state and give it a high ID, forcing the other peer to accept it on connect. But for this kind of application — casual games with friends or loved ones — the social cost of cheating prevents this kind of behavior, so I didn’t see a point in overengineering this.

There’s also a latency problem in Sword Duel: when your opponent switches to shield, it takes 100–1000ms for that state change to reach you. If you strike during this window, you’ll damage them even though they’re already supposed to be blocking strikes on their screen. The fix would be to make the peer that is being struck authoritative so the striker sends a strike event to this peer, who then validates and sends the validated strike as an action. This, however, adds a full round-trip delay to every hit, making the game feel sluggish in general. For a casual game played with friends, I prioritized responsiveness. Strikes are validated on the striker’s side using their local view of the game state. This makes the game feel snappy at the small cost of the occasional complaint from the defender.

Takeaways

The key simplification is the authoritative host. Instead of distributed consensus or server-based conflict resolution, one peer is the source of truth and others defer to it. With two peers, this falls into place naturally as WebRTC’s perfect negotiation already assigns roles and video bandwidth between two callers is manageable. Video streaming requirement is the main constraint that keeps this peer-to-peer approach limited to two peers. Without the video requirement, bandwidth isn’t an issue for small groups though you’d need to handle host assignment explicitly.

Beyond games, this approach could work for any collaborative editing scenario that doesn’t require handling high concurrency or throughput: e.g., pair programming, shared whiteboards, remote tutoring.

When would you need something more sophisticated? When edits happen faster than state can propagate, when there’s high data throughput, when offline support matters, or when conflicts can’t be resolved by “host wins.” But for the small-group, real-time, online-only case, this approach gets you surprisingly far with minimal complexity.


I’m currently open for projects or full time roles. Contact me hikevinmake@gmail.com.