Data Lifecycle — Common Patterns

Recognizing Patterns

After mapping enough systems, you'll see the same structures repeatedly. Learning to recognize these patterns lets you quickly understand new systems by saying "oh, this is basically a pipeline with a fan-out at the end" instead of mapping every stage from scratch.

Every real system is a combination of these patterns. The coffee shop is CRUD + Request/Response. The ATM is Request/Response with a two-phase commit. The social media post is CRUD + Pipeline + Fan-Out + Event-Driven. Knowing the patterns lets you identify the building blocks.


Pattern 1: CRUD (Create, Read, Update, Delete)

The most basic lifecycle. Data is created, read back, modified, and eventually removed.

Structure

Create:  Input → Validate (Transform) → Store (Storage)
Read:    Request (Transport) → Retrieve (Storage) → Return (Transport)  
Update:  Input → Validate (Transform) → Overwrite (Storage)
Delete:  Request → Remove (Storage) → Confirm (Transport)

Real-World Example: Contact List App

OperationLifecycle Steps
Create a contactUser enters name + phone → app validates (phone number format check) → saved to database
Read contactsUser opens app → app requests contacts from database → database returns list → app displays them sorted alphabetically
Update a contactUser edits phone number → app validates new number → database overwrites old record
Delete a contactUser taps delete → app asks "are you sure?" → sends delete request → database removes record → app removes from displayed list

What to Watch For

  • Read is never just "get the data." There's almost always sorting, filtering, or pagination involved — those are transforms.
  • Delete is rarely simple. What about related data? If you delete a customer, what happens to their orders? Their reviews? Their saved addresses?
  • Update conflicts. What if two people update the same record at the same time? The last write wins? The first write wins? They're told there's a conflict?

Lifecycle Map

┌────────────┐     validate      ┌────────────┐     store       ┌────────────┐
│ User Input │ ───────────────► │  Server    │ ─────────────► │  Database  │
│            │                   │ (transform)│                 │ (storage)  │
│            │ ◄─────────────── │            │ ◄───────────── │            │
│            │   display result  │            │   retrieve      │            │
└────────────┘                   └────────────┘                 └────────────┘

Pattern 2: Pipeline

Data flows through a series of transforms in sequence. Each step's output is the next step's input. No step stores data permanently — the final result is what gets stored or delivered.

Structure

Input → Step 1 (Transform) → Step 2 (Transform) → Step 3 (Transform) → Output

Real-World Example: Photo Upload Processing

When a user uploads a profile photo, it doesn't just get saved. It passes through a pipeline:

StepInputTransformOutput
1. ReceiveRaw uploaded fileVerify it's actually an image (not a virus)Validated image file
2. Strip metadataValidated imageRemove EXIF data (GPS, camera info — privacy)Clean image
3. ResizeClean imageCreate thumbnail (100px), medium (400px), large (800px) versionsThree image files
4. CompressThree imagesOptimize file sizes for web deliveryThree compressed images
5. Content scanCompressed images (or original)Automated check for prohibited contentSame images + moderation flag (pass/fail)
6. StoreCompressed imagesSave all three sizes to file storageURLs for each size
7. Update recordThree URLs + moderation flagUpdate user's profile record with new photo URLsUpdated database record

Pipeline Characteristics

Order matters. You can't compress before you resize (you'd compress the wrong sizes). You can't strip metadata after you store (the metadata would already be in storage). Each step depends on the previous step's output.

Failure stops the pipeline. If step 5 (content scan) flags the image, steps 6 and 7 never execute. The pipeline has a clear "abort" path at every stage.

Each step is independently testable. Give step 3 a known image, check that the output is three images of the right size. You don't need the rest of the pipeline to test this one step.

Lifecycle Map

Raw Upload → [Validate] → [Strip EXIF] → [Resize] → [Compress] → [Scan] → [Store] → [Update Record]
                                                                     │
                                                                     ▼ (if flagged)
                                                              [Reject + Notify User]

Pattern 3: Request/Response

One system asks a question, another system answers it. The data makes a round trip.

Structure

Requester → Question (Transport) → Responder → Process (Transform) → Answer (Transport) → Requester

Real-World Example: Weather App

StepWhat HappensCategory
1User opens weather app— (no data yet)
2App sends request: "What's the weather for ZIP 10001?"Transport (app → weather API)
3Weather API receives requestTransport complete
4API looks up current conditions for ZIP 10001Storage (read from weather database)
5API formats response (temperature, conditions, humidity, forecast)Transform (raw data → structured response)
6Response sent back to appTransport (API → app)
7App stores response in local cacheStorage (temporary — expires after 15 minutes)
8App displays weather to userTransport (app memory → screen)
9User checks again 5 minutes laterApp serves from cache (no new request)
1015 minutes pass, cache expiresCache entry deleted
11User checks againRepeat from step 2

Request/Response Characteristics

There's always a waiting period. Between sending the request and receiving the response, the requester is waiting. What does it show the user? A spinner? Stale cached data? Nothing?

Timeouts are essential. What if the response never comes? The requester must decide: wait forever? Give up after 5 seconds? Show an error?

Caching changes the lifecycle. If you cache responses, you now have the data stored in two places (the source and the cache). They can get out of sync. How stale is acceptable? Who invalidates the cache?


Pattern 4: Event-Driven (Fan-Out)

Something happens, and multiple independent parts of the system react — each with their own lifecycle.

Structure

Event Occurs → Broadcast (Transport)
                ├─→ Listener A → (its own lifecycle)
                ├─→ Listener B → (its own lifecycle)
                └─→ Listener C → (its own lifecycle)

Real-World Example: New User Signs Up

A user creates an account. This single event triggers many independent reactions:

ListenerWhat It DoesIts Own Lifecycle
Welcome Email ServiceSends a welcome emailRetrieve email template (storage) → Fill in user's name (transform) → Send email (transport) → Log delivery (storage)
Default Settings ServiceCreates the user's default preferencesGenerate default settings (transform) → Save to database (storage)
Analytics ServiceRecords the signup eventFormat event data (transform) → Write to analytics store (storage)
Onboarding ServiceCreates a guided tutorial checklistGenerate checklist (transform) → Save progress tracker (storage)
Admin DashboardUpdates the "new signups today" counterIncrement counter (transform) → Update dashboard data (storage)
Fraud DetectionChecks if signup looks legitimateAnalyze email domain, IP address, behavior patterns (transform) → Flag or clear (storage)

Fan-Out Characteristics

Listeners are independent. If the welcome email fails, the default settings should still be created. Each listener has its own success/failure path.

The event producer doesn't know (or care) about the listeners. The signup module just says "a user signed up." It doesn't know that six other systems are listening. This is intentional — it keeps the boundary clean.

Order usually doesn't matter. The welcome email can arrive before or after the default settings are created. But sometimes order does matter — the tutorial can't reference the user's settings if settings haven't been created yet. These ordering dependencies need to be explicit.

Fan-out can cascade. The welcome email might trigger its own event ("email sent"), which another listener responds to ("update email tracking dashboard"). One event can cascade into dozens of downstream data lifecycle chains.


Pattern 5: Batch Processing

Data accumulates over time, then is processed all at once on a schedule.

Structure

Events accumulate (Storage) → Timer fires → Retrieve batch (Transport) → Process all (Transform) → Store results (Storage) → Deliver (Transport)

Real-World Example: Daily Sales Report

StepWhat HappensCategoryTiming
1Orders happen throughout the dayStorage (each order written to database as it occurs)Ongoing, real-time
2Midnight: report job triggers— (timer event)Scheduled
3Job queries all orders for the dayTransport (database → report service)~Midnight
4Orders aggregated by category, region, payment methodTransform (aggregation)~Midnight
5Summary formatted into reportTransform (formatting)~Midnight
6Report storedStorage (persistent — saved to reports archive)~Midnight
7Report emailed to managementTransport (report service → email service → inboxes)~Midnight

Batch Processing Characteristics

There's a delay between event and processing. An order at 9am isn't reflected in the report until midnight. This is by design — but stakeholders must understand it.

The batch window is critical. If 100,000 orders need processing and the job takes 3 hours, it must start early enough to finish before anyone needs the results. What if order volume doubles?

Failed batches are painful. If the midnight job fails, there's no report in the morning. Is there a retry? A manual trigger? Does someone get alerted?

Idempotency matters. If the job runs twice (maybe it was retried), does it produce the same report or a duplicate? The job must be safe to re-run.


Using Patterns to Analyze New Systems

When you encounter a new system, ask:

  1. What's the dominant pattern? (Most features are CRUD at their core)
  2. Where are the pipelines? (Any time data is processed in steps)
  3. Where are the request/response boundaries? (Any time two systems talk)
  4. Where are the fan-out points? (Any time one action triggers multiple reactions)
  5. Is there batch processing? (Any time you hear "nightly," "weekly," "scheduled")

Most systems are a combination. "When a user signs up (CRUD: create user), send a welcome email (event-driven), process their uploaded profile photo (pipeline), and load their personalized dashboard (request/response pulling data from multiple sources)."

Naming the pattern lets you immediately know what lifecycle questions to ask, what failure modes to expect, and how the data flows.