High-level data flow:
ClickSteam.NextJS) hosted on Amplify.POST requests to clickstream-http-api at POST /clickstream.clickstream-lambda-ingest._ingest metadata and writes one JSON file per event to:s3://clickstream-s3-ingest/events/YYYY/MM/DD/HH/event-<uuid>.json (UTC hour partition)Design stays stateless and append-only, suitable for downstream batch ETL and idempotent inserts.
Two buckets are relevant:
Asset Bucket — clickstream-s3-sbw
RAW Clickstream Bucket — clickstream-s3-ingest
events/YYYY/MM/DD/HH/event-<uuid>.jsonHour partitions make batch ETL easier (e.g., process previous hour or a specific day/hour prefix).

clickstream-lambda-ingest
The clickstream-lambda-ingest function:
_ingest metadata only: receivedAt, sourceIp, userAgent, method, path, requestId, apiId, stage, traceId.The execution role should allow:
s3:PutObject on: arn:aws:s3:::clickstream-s3-ingest/events/*No read permissions are needed for this function.
clickstream-http-apiThe HTTP API provides a public HTTPS endpoint for ingestion:
POST /clickstream → Lambda clickstream-lambda-ingest
Recommended options:
eventId per event (UUID) for idempotent inserts.clientId in localStorage (sticky per browser).sessionId in sessionStorage (30m idle timeout) and isFirstVisit.userId, userLoginState, optional identity_source.pageUrl, referrer, element metadata for clicks (tag/id/role/text/dataset).product.{id,name,category,brand,price,discountPrice,urlPath}; ETL maps to DW context_product_* columns.page_view, global click.home_view, category_view, product_view, add_to_cart_click, remove_from_cart_click, wishlist_toggle, share_click, login_open, login_success, logout, checkout_start, checkout_complete.home_view: home page load (tracker component).category_view: category listing render (slug/params).product_view: product detail render (has product context).add_to_cart_click / remove_from_cart_click: cart add/remove handlers.wishlist_toggle: wishlist button handler.share_click: share button handler.login_open / login_success / logout: auth flows.checkout_start / checkout_complete: checkout entry and success flows.lib/clickstreamClient.ts: identity/session handling, base builders, console logging, fire-and-forget fetch to NEXT_PUBLIC_CLICKSTREAM_ENDPOINT (required env).lib/clickstreamEvents.ts: domain helpers that wrap trackCustom and build product/cart/order context.contexts/ClickstreamProvider.tsx + app/layout.tsx: wire global provider, auto page_view, global click listener.HomeTracker.tsx, CategoryTracker.tsx, ProductViewTracker.tsx skip the auto page_view and emit domain events.AddToCartButton.tsx, FavoriteButton.tsx, app/(client)/cart/page.tsx emit add/remove cart, wishlist, checkout events and mark buttons with global-clickstream-ignore-click to avoid duplicate global clicks.| Field / Block | Frontend payload | Raw S3 (after Ingest) | DW (PostgreSQL) | Notes |
|---|---|---|---|---|
| event_id | eventId generated on client | same as payload | event_id (maps from eventId; ETL fallback UUID) | Primary key, ON CONFLICT DO NOTHING |
| event_timestamp | - | - (S3 LastModified exists) | _ingest.receivedAt > payload > LastModified | Derived by ETL |
| event_name | eventName | eventName | event_name | |
| page_url | pageUrl | pageUrl | - | Not stored in DW |
| referrer | referrer | referrer | - | Not stored in DW |
| user_id | userId | userId | user_id | |
| user_login_state | userLoginState | userLoginState | user_login_state | |
| identity_source | optional | optional | identity_source | Needs frontend/auth to populate |
| client_id | clientId | clientId | client_id | |
| session_id | sessionId | sessionId | session_id | |
| is_first_visit | isFirstVisit | isFirstVisit | is_first_visit | |
| product id | product.id | product.id | context_product_id | |
| product name | product.name | product.name | context_product_name | |
| product category | product.category | product.category | context_product_category | |
| product brand | product.brandName / brand?.name / brand?.title / brand | product.brand or product.brandName | context_product_brand | Frontend maps brand from DB fields |
| product price | product.price | product.price | context_product_price | BIGINT |
| product discount price | product.discountPrice | product.discountPrice | context_product_discount_price | BIGINT |
| product url path | product.urlPath | product.urlPath | context_product_url_path | |
| element metadata | element.{tag,id,role,text,dataset} | element... | - | Not stored (needs schema change) |
| ingest metadata | - | _ingest.{receivedAt,sourceIp,userAgent,method,path,requestId,apiId,stage,traceId} | - | Not stored; available during ETL |
Call pattern (one event per request):
await fetch("https://<api-id>.execute-api.ap-southeast-1.amazonaws.com/clickstream", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(eventPayload),
keepalive: true,
});
ETL maps eventId -> event_id (PK with ON CONFLICT DO NOTHING), derives event_timestamp from _ingest.receivedAt > payload > S3 LastModified, and inserts into clickstream_dw.public.clickstream_events.
To validate ingestion:
clickstream-s3-ingest:events/YYYY/MM/DD/HH/event-<uuid>.json files appear._ingest metadata and product context are present.