Skip to main content
Document Databases

Document Databases in 2025: Tackling Real-Time Data with a Fresh Perspective

Real-time data is no longer a niche requirement. Streaming analytics, live dashboards, IoT telemetry, and collaborative editing all demand sub-second reads and writes—often from globally distributed users. Document databases have become a default choice for these workloads, but the 2025 landscape is more nuanced than a simple 'MongoDB vs. Couchbase' debate. Teams need to think about schema evolution, consistency models, indexing strategies, and operational overhead in ways that the early NoSQL era rarely required. This guide is for architects and senior engineers who already understand basic document store concepts. We will not rehash what BSON is or how to install a cluster. Instead, we focus on the process-level decisions that separate a smooth real-time deployment from a costly refactor. We use composite scenarios—drawn from patterns we have observed across many teams—to illustrate trade-offs without leaning on fake case studies.

Real-time data is no longer a niche requirement. Streaming analytics, live dashboards, IoT telemetry, and collaborative editing all demand sub-second reads and writes—often from globally distributed users. Document databases have become a default choice for these workloads, but the 2025 landscape is more nuanced than a simple 'MongoDB vs. Couchbase' debate. Teams need to think about schema evolution, consistency models, indexing strategies, and operational overhead in ways that the early NoSQL era rarely required.

This guide is for architects and senior engineers who already understand basic document store concepts. We will not rehash what BSON is or how to install a cluster. Instead, we focus on the process-level decisions that separate a smooth real-time deployment from a costly refactor. We use composite scenarios—drawn from patterns we have observed across many teams—to illustrate trade-offs without leaning on fake case studies. By the end, you should have a clearer framework for choosing and tuning a document database for your specific real-time constraints.

Why Real-Time Data Demands a Rethink of Document Database Choices

The promise of document databases has always been flexibility: store JSON-like documents, query them with a rich API, and scale horizontally. For many years, that was enough. Batch processing, nightly ETL jobs, and moderate read volumes did not stress the architecture. Real-time workloads change the equation. When every millisecond counts, the document model's strengths—schema-on-read, nested structures, dynamic fields—can become liabilities if not managed carefully.

Consider a fleet of delivery drones sending telemetry every 200 milliseconds. Each message includes GPS coordinates, battery level, motor temperature, and a status code. In a traditional document store, you might store each reading as a separate document with a timestamp and device ID. That is straightforward, but the write volume can saturate a single shard if the shard key is poorly chosen. More subtly, the read patterns are not uniform: a dashboard needs the latest reading from every drone (a scatter-gather query), while a historical analysis might need all readings from one drone over a time range. The same data, two very different access patterns, and the document database must serve both without grinding to a halt.

Another pressure is consistency. Real-time systems often need to reflect state changes quickly. If a drone reports a critical battery warning, the operations team should see it within seconds. But document databases traditionally offer tunable consistency: you can trade read freshness for lower latency. In 2025, many teams find that the default eventual consistency is not enough for operational decisions. They need read-your-writes guarantees or even linearizability for certain updates. That shifts the choice of database and the configuration of replication.

Cost is a third factor. Real-time ingestion at scale generates terabytes of data per day. Storing every event as a separate document is expensive in storage and index size. Teams are increasingly turning to time-series optimizations, bucketing multiple readings into a single document, or using compression features built into newer database versions. The decision of how to model time-series data in a document store is a process question: it affects query performance, write throughput, and operational complexity. We will explore these trade-offs in detail later.

The key takeaway is that real-time workloads force teams to be intentional about schema design, indexing, consistency, and cost from day one. The old 'just throw it in a document and figure it out later' approach leads to painful migrations. This guide provides a framework for making those intentional choices.

The Shift from Schema-on-Read to Schema Governance

Schema-on-read was a rallying cry of early NoSQL. It meant you could insert documents with different fields and let the application handle the variability. In real-time systems, that freedom creates hidden costs. Every query must handle missing fields, type mismatches, and unexpected nesting. Indexes become harder to maintain because the index key might not exist in every document. Teams in 2025 are moving toward lightweight schema governance: using JSON Schema validation at the database level or application-side schemas with tools like Mongoose or Pydantic. This is not a return to rigid relational schemas, but a pragmatic middle ground that reduces surprises in production.

Core Idea: Treating Documents as Events, Not Records

The mental model shift that underpins successful real-time document database usage is to think of each document as an event rather than a record. A record is a snapshot of state that you overwrite. An event is an immutable fact that you append. This distinction changes everything about how you design your data model, indexes, and queries.

In a record-oriented approach, you might have a 'drone' document that you update every time a telemetry reading arrives. The document contains the latest GPS, battery, and status. This is simple to query—you just fetch the drone document—but it creates write contention. Every update must lock or retry, and concurrent updates from the same drone can cause lost updates. In an event-oriented approach, each telemetry reading is a separate document with a timestamp and drone ID. The latest state is computed by aggregating the most recent event per drone, often using a materialized view or a change stream.

The event model scales better for write-heavy workloads because each write is an insert, not an update. Inserts are append-only and do not require locking. They also provide a full audit trail: you can replay events to reconstruct state at any point in time. The downside is that reads become more expensive because you need to aggregate events to get the current state. This is a classic trade-off, and the right choice depends on your read-to-write ratio.

For the drone fleet, we found that a hybrid approach works well: store raw telemetry as events in one collection, and maintain a separate 'latest state' collection that is updated asynchronously via a change stream or a lightweight trigger. The latest state collection uses a record model (one document per drone, updated on each event), but the updates are batched or deduplicated to reduce write contention. This gives fast reads for the dashboard and full historical data for analytics.

Another implication of the event model is that you need a robust way to handle out-of-order events. In distributed systems, events from the same drone can arrive in the wrong order due to network delays. Your database must either sort by a client-provided timestamp or use a version vector to resolve conflicts. Most document databases support optimistic concurrency control with a version field, but you have to implement the conflict resolution logic yourself. This is a process detail that many teams overlook until they see inconsistent dashboards.

Why Event Sourcing Patterns Fit Document Stores

Event sourcing is a well-known pattern in the CQRS world, but it maps naturally to document databases. Each event is a document, and the aggregate root is a collection of events. The database's native support for arrays and nested objects makes it easy to store event metadata like causation IDs or correlation IDs. Some databases, like MongoDB, offer change streams that can feed events into a materialized view or a separate analytics system. This pattern is becoming standard in 2025 for real-time applications that need both low-latency reads and a complete history.

How It Works Under the Hood: Indexing, Sharding, and Consistency in Practice

Understanding the internals of document databases helps you make better process decisions. We will look at three key mechanisms: indexing for real-time queries, sharding for write scaling, and consistency models for read freshness.

Indexing. In a real-time system, you cannot afford full collection scans. Most document databases support B-tree indexes, compound indexes, and partial indexes. The critical insight is that index maintenance adds latency to writes. Every insert or update must update all relevant indexes. If you have many indexes, write throughput drops. For event-style collections with high write volume, you want to minimize indexes. Often, a single compound index on (device_id, timestamp) is enough for the most common query: get the latest event for a device. Additional indexes for ad-hoc queries can be added on a secondary collection that is updated asynchronously.

Another indexing technique is the use of TTL (time-to-live) indexes to automatically expire old events. This keeps the collection size manageable without manual cleanup. For the drone fleet, we set a TTL of 30 days on raw telemetry events. The latest state collection has no TTL because it is small and updated in place.

Sharding. Sharding distributes data across nodes based on a shard key. Choosing the right shard key is the most impactful decision for write scalability. A common mistake is to use a monotonically increasing key like a timestamp. That causes all writes to go to one shard (the hot shard) until it fills, then the next, leading to uneven load. Instead, use a shard key with high cardinality and even distribution, such as a hash of the device ID. Some databases support hashed shard keys that automatically distribute writes. For the drone fleet, we shard on hashed device_id, which spreads writes evenly across shards. The trade-off is that range queries across devices become scatter-gather operations, but for real-time dashboards that query a single device, it is efficient.

Consistency. Document databases offer various consistency levels. In MongoDB, you can set read concern to 'majority' to get strong consistency, but that adds latency because the read must wait for a majority of replicas to acknowledge. For real-time dashboards, many teams use 'local' read concern (the fastest) and accept that stale reads are possible. To mitigate, they use a hybrid: critical updates (like battery warnings) are written with write concern 'majority' and read with 'majority', while non-critical telemetry uses 'local' for both. This requires application-level awareness of consistency levels, but it balances speed and correctness.

Change Streams as a Real-Time Backbone

Change streams are a powerful feature available in MongoDB and Couchbase. They allow applications to subscribe to real-time changes in a collection. In the drone fleet, we use change streams to feed events into a materialized view (the latest state collection) and also to trigger alerts when certain conditions are met. Change streams are ordered and at-least-once, but they can fall behind under heavy write load. Monitoring the change stream lag is essential. We set up alerts if lag exceeds 5 seconds.

Worked Example: Building a Real-Time Fleet Dashboard with Document Stores

Let us walk through a composite scenario: a logistics company manages 10,000 delivery drones. Each drone sends a telemetry event every 200 milliseconds: device_id, timestamp, lat, lng, battery_pct, motor_temp_c, status (ok, warning, critical). The operations team needs a dashboard showing the latest status of every drone, updated within 2 seconds. They also need historical queries: battery trends for a specific drone over the last hour.

We chose MongoDB for this project because of its mature change streams, flexible indexing, and wide deployment. The data model uses two collections:

  • telemetry_events: one document per event, with fields as above, plus a TTL index on timestamp (30 days). Shard key: hashed device_id.
  • drone_latest: one document per drone, with device_id as _id, and fields for the latest values of lat, lng, battery_pct, motor_temp_c, status, and last_updated. This collection is updated by a change stream listener that runs in a microservice.

The dashboard queries drone_latest for all devices (a scatter-gather query across shards). To keep that query fast, we ensure the collection is small (10,000 documents) and has a simple index on device_id (the primary key). The historical queries use telemetry_events with a compound index on (device_id, timestamp).

The change stream listener is the critical piece. It reads from telemetry_events in real time, batches updates to drone_latest every 500 milliseconds, and uses upsert operations. If the listener crashes, it can resume from the last checkpoint stored in a separate collection. We also handle out-of-order events by comparing timestamps: if an incoming event has an older timestamp than the current latest, we ignore it (since we assume the newer event is more accurate). This is a simple last-writer-wins strategy that works for this use case.

Performance results: the system ingests 50,000 events per second (10,000 drones × 5 events/sec) with average write latency under 10 ms. The dashboard reads show data within 1–2 seconds of the event. Historical queries over one hour for a single drone return in under 100 ms. The main bottleneck is the change stream listener, which must keep up with the write rate. We scaled it by partitioning the change stream by shard (each shard has its own listener).

This scenario illustrates the process decisions: choosing an event model for writes, a materialized view for reads, and careful use of change streams to bridge the two. The same pattern applies to other real-time domains like IoT sensor networks, financial tick data, or social media feeds.

Edge Cases and Exceptions: When the Model Breaks

Even a well-designed system hits edge cases. Here are three that frequently trip up teams.

Out-of-order events with critical updates. Suppose a drone sends a critical battery warning, then a normal update that arrives first due to network latency. The normal update overwrites the latest state, and the critical warning is lost. Our timestamp-based last-writer-wins rule would accept the normal update (newer timestamp) and ignore the critical warning. That is dangerous. The fix is to treat status changes as special: if the incoming event has a higher severity (critical > warning > ok), we always apply it regardless of timestamp. This requires adding a severity field and modifying the merge logic.

Schema migration at scale. After six months, the team decides to add a new field (ambient_temperature) to telemetry events. With schema-on-read, old documents lack this field. Queries that expect ambient_temperature must handle nulls. More critically, indexes on the new field will not cover old documents until they are updated. A background migration that reads old documents and rewrites them with the new field (set to null) can take days. During that time, queries may be incomplete. The process solution is to use a two-phase migration: first, make the application code handle missing fields; second, run a batch job to backfill old documents; third, add the index. This is not unique to document databases, but the lack of a schema migration tool (like ALTER TABLE) means you have to build it yourself.

Partial failures in change streams. If the change stream listener crashes and restarts, it may miss events if the checkpoint is not updated frequently enough. In our system, we checkpoint every 1000 events. If the listener crashes between checkpoints, up to 1000 events are replayed, causing duplicate updates to drone_latest. That is acceptable because updates are idempotent (upsert). But if the listener processes an event and then crashes before checkpointing, the same event is processed again. This can cause duplicate entries in a log collection if you are not careful. The fix is to make the downstream operation idempotent or use a transactional outbox pattern.

These edge cases highlight that real-time systems require careful handling of ordering, idempotency, and failure recovery. Document databases provide the primitives, but the application logic must fill the gaps.

Limits of the Approach: When Document Databases Struggle for Real-Time

Document databases are not a universal solution. There are scenarios where they fall short, and teams should consider alternatives.

High-frequency trading or other microsecond-latency workloads. Document databases typically have overhead from JSON parsing, B-tree index lookups, and network round trips. If you need single-digit microsecond latencies, an in-memory data grid or a specialized time-series database like InfluxDB may be better. The document model adds too much abstraction.

Complex multi-document transactions with strong consistency. While MongoDB and Couchbase now support multi-document ACID transactions, they come with performance penalties. In real-time systems, using transactions across shards can cause contention and latency spikes. If your workload requires many cross-document transactions (e.g., a banking ledger), a relational database with native transaction support might be simpler and faster.

Very large documents (>16 MB). MongoDB's document size limit is 16 MB. If your real-time events include large binary payloads (e.g., images from drones), you need to store them externally and reference them. This adds complexity. Some document databases like Couchbase have higher limits, but large documents still impact performance because they are read and written as a whole.

Ad-hoc analytical queries on event streams. Document databases are optimized for operational queries (point lookups, small range scans). If you need to run complex aggregations over millions of events (e.g., average battery temperature across all drones over a week), a columnar database or a data warehouse is more efficient. Many teams use a document database for real-time ingestion and a separate analytics store for historical queries, connected via a change stream or ETL pipeline.

These limits are not deal-breakers for most real-time applications, but they define the boundaries of the approach. Knowing them helps you avoid over-engineering or hitting a wall later.

Reader FAQ

Can I use joins in a document database for real-time queries?

Most document databases do not support joins natively (MongoDB added $lookup, but it is slow for real-time). The recommended approach is to embed related data or use denormalization. For real-time systems, avoid joins at query time; instead, pre-join data in a materialized view or use application-side aggregation.

How do I handle schema changes without downtime?

Use a backward-compatible schema: add fields with default values, and make the application handle missing fields. Migrate old documents in the background using a batch job. Avoid renaming or removing fields until all old documents are migrated. This is similar to a zero-downtime migration in relational databases but requires more manual steps.

What is the best shard key for time-series data?

A hashed shard key on a high-cardinality field like device_id distributes writes evenly. Avoid using timestamp alone as a shard key because it creates hot shards. If you need range queries on time, consider a compound shard key like (device_id, timestamp) with a hashed prefix, or use a bucketing strategy where you store multiple events in one document per time window.

How do I estimate cost for a real-time document database deployment?

Cost depends on write throughput, storage, and index size. For write-heavy workloads, the number of shards and replica sets drives cost. Estimate your peak writes per second and choose a cluster size that can handle 2x that with headroom. Storage costs are driven by document size and TTL policies. Use compression and TTL indexes to reduce storage. Many cloud providers offer reserved instances for predictable costs.

Is eventual consistency acceptable for real-time dashboards?

It depends on the use case. For monitoring dashboards where a few seconds of staleness is acceptable, eventual consistency with local reads works well. For operational decisions (e.g., triggering an emergency stop), you need strong consistency. A common pattern is to use strong consistency for critical fields and eventual consistency for others, with application-level logic to enforce the distinction.

These answers reflect common patterns we have seen across teams. Your specific requirements may vary, so always test with your own workload.

Share this article:

Comments (0)

No comments yet. Be the first to comment!