Data Lifecycle — Common Patterns
Recognizing Patterns
After mapping enough systems, you'll see the same structures repeatedly. Learning to recognize these patterns lets you quickly understand new systems by saying "oh, this is basically a pipeline with a fan-out at the end" instead of mapping every stage from scratch.
Every real system is a combination of these patterns. The coffee shop is CRUD + Request/Response. The ATM is Request/Response with a two-phase commit. The social media post is CRUD + Pipeline + Fan-Out + Event-Driven. Knowing the patterns lets you identify the building blocks.
Pattern 1: CRUD (Create, Read, Update, Delete)
The most basic lifecycle. Data is created, read back, modified, and eventually removed.
Structure
Create: Input → Validate (Transform) → Store (Storage)
Read: Request (Transport) → Retrieve (Storage) → Return (Transport)
Update: Input → Validate (Transform) → Overwrite (Storage)
Delete: Request → Remove (Storage) → Confirm (Transport)
Real-World Example: Contact List App
| Operation | Lifecycle Steps |
|---|---|
| Create a contact | User enters name + phone → app validates (phone number format check) → saved to database |
| Read contacts | User opens app → app requests contacts from database → database returns list → app displays them sorted alphabetically |
| Update a contact | User edits phone number → app validates new number → database overwrites old record |
| Delete a contact | User taps delete → app asks "are you sure?" → sends delete request → database removes record → app removes from displayed list |
What to Watch For
- Read is never just "get the data." There's almost always sorting, filtering, or pagination involved — those are transforms.
- Delete is rarely simple. What about related data? If you delete a customer, what happens to their orders? Their reviews? Their saved addresses?
- Update conflicts. What if two people update the same record at the same time? The last write wins? The first write wins? They're told there's a conflict?
Lifecycle Map
┌────────────┐ validate ┌────────────┐ store ┌────────────┐
│ User Input │ ───────────────► │ Server │ ─────────────► │ Database │
│ │ │ (transform)│ │ (storage) │
│ │ ◄─────────────── │ │ ◄───────────── │ │
│ │ display result │ │ retrieve │ │
└────────────┘ └────────────┘ └────────────┘
Pattern 2: Pipeline
Data flows through a series of transforms in sequence. Each step's output is the next step's input. No step stores data permanently — the final result is what gets stored or delivered.
Structure
Input → Step 1 (Transform) → Step 2 (Transform) → Step 3 (Transform) → Output
Real-World Example: Photo Upload Processing
When a user uploads a profile photo, it doesn't just get saved. It passes through a pipeline:
| Step | Input | Transform | Output |
|---|---|---|---|
| 1. Receive | Raw uploaded file | Verify it's actually an image (not a virus) | Validated image file |
| 2. Strip metadata | Validated image | Remove EXIF data (GPS, camera info — privacy) | Clean image |
| 3. Resize | Clean image | Create thumbnail (100px), medium (400px), large (800px) versions | Three image files |
| 4. Compress | Three images | Optimize file sizes for web delivery | Three compressed images |
| 5. Content scan | Compressed images (or original) | Automated check for prohibited content | Same images + moderation flag (pass/fail) |
| 6. Store | Compressed images | Save all three sizes to file storage | URLs for each size |
| 7. Update record | Three URLs + moderation flag | Update user's profile record with new photo URLs | Updated database record |
Pipeline Characteristics
Order matters. You can't compress before you resize (you'd compress the wrong sizes). You can't strip metadata after you store (the metadata would already be in storage). Each step depends on the previous step's output.
Failure stops the pipeline. If step 5 (content scan) flags the image, steps 6 and 7 never execute. The pipeline has a clear "abort" path at every stage.
Each step is independently testable. Give step 3 a known image, check that the output is three images of the right size. You don't need the rest of the pipeline to test this one step.
Lifecycle Map
Raw Upload → [Validate] → [Strip EXIF] → [Resize] → [Compress] → [Scan] → [Store] → [Update Record]
│
▼ (if flagged)
[Reject + Notify User]
Pattern 3: Request/Response
One system asks a question, another system answers it. The data makes a round trip.
Structure
Requester → Question (Transport) → Responder → Process (Transform) → Answer (Transport) → Requester
Real-World Example: Weather App
| Step | What Happens | Category |
|---|---|---|
| 1 | User opens weather app | — (no data yet) |
| 2 | App sends request: "What's the weather for ZIP 10001?" | Transport (app → weather API) |
| 3 | Weather API receives request | Transport complete |
| 4 | API looks up current conditions for ZIP 10001 | Storage (read from weather database) |
| 5 | API formats response (temperature, conditions, humidity, forecast) | Transform (raw data → structured response) |
| 6 | Response sent back to app | Transport (API → app) |
| 7 | App stores response in local cache | Storage (temporary — expires after 15 minutes) |
| 8 | App displays weather to user | Transport (app memory → screen) |
| 9 | User checks again 5 minutes later | App serves from cache (no new request) |
| 10 | 15 minutes pass, cache expires | Cache entry deleted |
| 11 | User checks again | Repeat from step 2 |
Request/Response Characteristics
There's always a waiting period. Between sending the request and receiving the response, the requester is waiting. What does it show the user? A spinner? Stale cached data? Nothing?
Timeouts are essential. What if the response never comes? The requester must decide: wait forever? Give up after 5 seconds? Show an error?
Caching changes the lifecycle. If you cache responses, you now have the data stored in two places (the source and the cache). They can get out of sync. How stale is acceptable? Who invalidates the cache?
Pattern 4: Event-Driven (Fan-Out)
Something happens, and multiple independent parts of the system react — each with their own lifecycle.
Structure
Event Occurs → Broadcast (Transport)
├─→ Listener A → (its own lifecycle)
├─→ Listener B → (its own lifecycle)
└─→ Listener C → (its own lifecycle)
Real-World Example: New User Signs Up
A user creates an account. This single event triggers many independent reactions:
| Listener | What It Does | Its Own Lifecycle |
|---|---|---|
| Welcome Email Service | Sends a welcome email | Retrieve email template (storage) → Fill in user's name (transform) → Send email (transport) → Log delivery (storage) |
| Default Settings Service | Creates the user's default preferences | Generate default settings (transform) → Save to database (storage) |
| Analytics Service | Records the signup event | Format event data (transform) → Write to analytics store (storage) |
| Onboarding Service | Creates a guided tutorial checklist | Generate checklist (transform) → Save progress tracker (storage) |
| Admin Dashboard | Updates the "new signups today" counter | Increment counter (transform) → Update dashboard data (storage) |
| Fraud Detection | Checks if signup looks legitimate | Analyze email domain, IP address, behavior patterns (transform) → Flag or clear (storage) |
Fan-Out Characteristics
Listeners are independent. If the welcome email fails, the default settings should still be created. Each listener has its own success/failure path.
The event producer doesn't know (or care) about the listeners. The signup module just says "a user signed up." It doesn't know that six other systems are listening. This is intentional — it keeps the boundary clean.
Order usually doesn't matter. The welcome email can arrive before or after the default settings are created. But sometimes order does matter — the tutorial can't reference the user's settings if settings haven't been created yet. These ordering dependencies need to be explicit.
Fan-out can cascade. The welcome email might trigger its own event ("email sent"), which another listener responds to ("update email tracking dashboard"). One event can cascade into dozens of downstream data lifecycle chains.
Pattern 5: Batch Processing
Data accumulates over time, then is processed all at once on a schedule.
Structure
Events accumulate (Storage) → Timer fires → Retrieve batch (Transport) → Process all (Transform) → Store results (Storage) → Deliver (Transport)
Real-World Example: Daily Sales Report
| Step | What Happens | Category | Timing |
|---|---|---|---|
| 1 | Orders happen throughout the day | Storage (each order written to database as it occurs) | Ongoing, real-time |
| 2 | Midnight: report job triggers | — (timer event) | Scheduled |
| 3 | Job queries all orders for the day | Transport (database → report service) | ~Midnight |
| 4 | Orders aggregated by category, region, payment method | Transform (aggregation) | ~Midnight |
| 5 | Summary formatted into report | Transform (formatting) | ~Midnight |
| 6 | Report stored | Storage (persistent — saved to reports archive) | ~Midnight |
| 7 | Report emailed to management | Transport (report service → email service → inboxes) | ~Midnight |
Batch Processing Characteristics
There's a delay between event and processing. An order at 9am isn't reflected in the report until midnight. This is by design — but stakeholders must understand it.
The batch window is critical. If 100,000 orders need processing and the job takes 3 hours, it must start early enough to finish before anyone needs the results. What if order volume doubles?
Failed batches are painful. If the midnight job fails, there's no report in the morning. Is there a retry? A manual trigger? Does someone get alerted?
Idempotency matters. If the job runs twice (maybe it was retried), does it produce the same report or a duplicate? The job must be safe to re-run.
Using Patterns to Analyze New Systems
When you encounter a new system, ask:
- What's the dominant pattern? (Most features are CRUD at their core)
- Where are the pipelines? (Any time data is processed in steps)
- Where are the request/response boundaries? (Any time two systems talk)
- Where are the fan-out points? (Any time one action triggers multiple reactions)
- Is there batch processing? (Any time you hear "nightly," "weekly," "scheduled")
Most systems are a combination. "When a user signs up (CRUD: create user), send a welcome email (event-driven), process their uploaded profile photo (pipeline), and load their personalized dashboard (request/response pulling data from multiple sources)."
Naming the pattern lets you immediately know what lifecycle questions to ask, what failure modes to expect, and how the data flows.