Data Lifecycle — Common Patterns

Recognizing Patterns

After mapping enough systems, you'll see the same structures repeatedly. Learning to recognize these patterns lets you quickly understand new systems by saying "oh, this is basically a pipeline with a fan-out at the end" instead of mapping every stage from scratch.

Every real system is a combination of these patterns. The coffee shop is CRUD + Request/Response. The ATM is Request/Response with a two-phase commit. The social media post is CRUD + Pipeline + Fan-Out + Event-Driven. Knowing the patterns lets you identify the building blocks.

Pattern 1: CRUD (Create, Read, Update, Delete)

The most basic lifecycle. Data is created, read back, modified, and eventually removed.

Structure

Create:  Input → Validate (Transform) → Store (Storage)
Read:    Request (Transport) → Retrieve (Storage) → Return (Transport)  
Update:  Input → Validate (Transform) → Overwrite (Storage)
Delete:  Request → Remove (Storage) → Confirm (Transport)

Real-World Example: Contact List App

Operation	Lifecycle Steps
Create a contact	User enters name + phone → app validates (phone number format check) → saved to database
Read contacts	User opens app → app requests contacts from database → database returns list → app displays them sorted alphabetically
Update a contact	User edits phone number → app validates new number → database overwrites old record
Delete a contact	User taps delete → app asks "are you sure?" → sends delete request → database removes record → app removes from displayed list

What to Watch For

Read is never just "get the data." There's almost always sorting, filtering, or pagination involved — those are transforms.
Delete is rarely simple. What about related data? If you delete a customer, what happens to their orders? Their reviews? Their saved addresses?
Update conflicts. What if two people update the same record at the same time? The last write wins? The first write wins? They're told there's a conflict?

Lifecycle Map

┌────────────┐     validate      ┌────────────┐     store       ┌────────────┐
│ User Input │ ───────────────► │  Server    │ ─────────────► │  Database  │
│            │                   │ (transform)│                 │ (storage)  │
│            │ ◄─────────────── │            │ ◄───────────── │            │
│            │   display result  │            │   retrieve      │            │
└────────────┘                   └────────────┘                 └────────────┘

Pattern 2: Pipeline

Data flows through a series of transforms in sequence. Each step's output is the next step's input. No step stores data permanently — the final result is what gets stored or delivered.

Structure

Input → Step 1 (Transform) → Step 2 (Transform) → Step 3 (Transform) → Output

Real-World Example: Photo Upload Processing

When a user uploads a profile photo, it doesn't just get saved. It passes through a pipeline:

Step	Input	Transform	Output
1. Receive	Raw uploaded file	Verify it's actually an image (not a virus)	Validated image file
2. Strip metadata	Validated image	Remove EXIF data (GPS, camera info — privacy)	Clean image
3. Resize	Clean image	Create thumbnail (100px), medium (400px), large (800px) versions	Three image files
4. Compress	Three images	Optimize file sizes for web delivery	Three compressed images
5. Content scan	Compressed images (or original)	Automated check for prohibited content	Same images + moderation flag (pass/fail)
6. Store	Compressed images	Save all three sizes to file storage	URLs for each size
7. Update record	Three URLs + moderation flag	Update user's profile record with new photo URLs	Updated database record

Pipeline Characteristics

Order matters. You can't compress before you resize (you'd compress the wrong sizes). You can't strip metadata after you store (the metadata would already be in storage). Each step depends on the previous step's output.

Failure stops the pipeline. If step 5 (content scan) flags the image, steps 6 and 7 never execute. The pipeline has a clear "abort" path at every stage.

Each step is independently testable. Give step 3 a known image, check that the output is three images of the right size. You don't need the rest of the pipeline to test this one step.

Lifecycle Map

Raw Upload → [Validate] → [Strip EXIF] → [Resize] → [Compress] → [Scan] → [Store] → [Update Record]
                                                                     │
                                                                     ▼ (if flagged)
                                                              [Reject + Notify User]

Pattern 3: Request/Response

One system asks a question, another system answers it. The data makes a round trip.

Structure

Requester → Question (Transport) → Responder → Process (Transform) → Answer (Transport) → Requester

Real-World Example: Weather App

Step	What Happens	Category
1	User opens weather app	— (no data yet)
2	App sends request: "What's the weather for ZIP 10001?"	Transport (app → weather API)
3	Weather API receives request	Transport complete
4	API looks up current conditions for ZIP 10001	Storage (read from weather database)
5	API formats response (temperature, conditions, humidity, forecast)	Transform (raw data → structured response)
6	Response sent back to app	Transport (API → app)
7	App stores response in local cache	Storage (temporary — expires after 15 minutes)
8	App displays weather to user	Transport (app memory → screen)
9	User checks again 5 minutes later	App serves from cache (no new request)
10	15 minutes pass, cache expires	Cache entry deleted
11	User checks again	Repeat from step 2

Request/Response Characteristics

There's always a waiting period. Between sending the request and receiving the response, the requester is waiting. What does it show the user? A spinner? Stale cached data? Nothing?

Timeouts are essential. What if the response never comes? The requester must decide: wait forever? Give up after 5 seconds? Show an error?

Caching changes the lifecycle. If you cache responses, you now have the data stored in two places (the source and the cache). They can get out of sync. How stale is acceptable? Who invalidates the cache?

Pattern 4: Event-Driven (Fan-Out)

Something happens, and multiple independent parts of the system react — each with their own lifecycle.

Structure

Event Occurs → Broadcast (Transport)
                ├─→ Listener A → (its own lifecycle)
                ├─→ Listener B → (its own lifecycle)
                └─→ Listener C → (its own lifecycle)

Real-World Example: New User Signs Up

A user creates an account. This single event triggers many independent reactions:

Listener	What It Does	Its Own Lifecycle
Welcome Email Service	Sends a welcome email	Retrieve email template (storage) → Fill in user's name (transform) → Send email (transport) → Log delivery (storage)
Default Settings Service	Creates the user's default preferences	Generate default settings (transform) → Save to database (storage)
Analytics Service	Records the signup event	Format event data (transform) → Write to analytics store (storage)
Onboarding Service	Creates a guided tutorial checklist	Generate checklist (transform) → Save progress tracker (storage)
Admin Dashboard	Updates the "new signups today" counter	Increment counter (transform) → Update dashboard data (storage)
Fraud Detection	Checks if signup looks legitimate	Analyze email domain, IP address, behavior patterns (transform) → Flag or clear (storage)

Fan-Out Characteristics

Listeners are independent. If the welcome email fails, the default settings should still be created. Each listener has its own success/failure path.

The event producer doesn't know (or care) about the listeners. The signup module just says "a user signed up." It doesn't know that six other systems are listening. This is intentional — it keeps the boundary clean.

Order usually doesn't matter. The welcome email can arrive before or after the default settings are created. But sometimes order does matter — the tutorial can't reference the user's settings if settings haven't been created yet. These ordering dependencies need to be explicit.

Fan-out can cascade. The welcome email might trigger its own event ("email sent"), which another listener responds to ("update email tracking dashboard"). One event can cascade into dozens of downstream data lifecycle chains.

Pattern 5: Batch Processing

Data accumulates over time, then is processed all at once on a schedule.

Structure

Events accumulate (Storage) → Timer fires → Retrieve batch (Transport) → Process all (Transform) → Store results (Storage) → Deliver (Transport)

Real-World Example: Daily Sales Report

Step	What Happens	Category	Timing
1	Orders happen throughout the day	Storage (each order written to database as it occurs)	Ongoing, real-time
2	Midnight: report job triggers	— (timer event)	Scheduled
3	Job queries all orders for the day	Transport (database → report service)	~Midnight
4	Orders aggregated by category, region, payment method	Transform (aggregation)	~Midnight
5	Summary formatted into report	Transform (formatting)	~Midnight
6	Report stored	Storage (persistent — saved to reports archive)	~Midnight
7	Report emailed to management	Transport (report service → email service → inboxes)	~Midnight

Batch Processing Characteristics

There's a delay between event and processing. An order at 9am isn't reflected in the report until midnight. This is by design — but stakeholders must understand it.

The batch window is critical. If 100,000 orders need processing and the job takes 3 hours, it must start early enough to finish before anyone needs the results. What if order volume doubles?

Failed batches are painful. If the midnight job fails, there's no report in the morning. Is there a retry? A manual trigger? Does someone get alerted?

Idempotency matters. If the job runs twice (maybe it was retried), does it produce the same report or a duplicate? The job must be safe to re-run.

Using Patterns to Analyze New Systems

When you encounter a new system, ask:

What's the dominant pattern? (Most features are CRUD at their core)
Where are the pipelines? (Any time data is processed in steps)
Where are the request/response boundaries? (Any time two systems talk)
Where are the fan-out points? (Any time one action triggers multiple reactions)
Is there batch processing? (Any time you hear "nightly," "weekly," "scheduled")

Most systems are a combination. "When a user signs up (CRUD: create user), send a welcome email (event-driven), process their uploaded profile photo (pipeline), and load their personalized dashboard (request/response pulling data from multiple sources)."

Naming the pattern lets you immediately know what lifecycle questions to ask, what failure modes to expect, and how the data flows.

Systems Thinking for Engineers