AWS Data Exchange — Product catalog (Husky Technology Pte Ltd, seller account 969735114743)¶

Purpose: Define the AWS DX product line-up, pricing tiers, and the Phase-1 free-sample contents. Upstream: AWS DX Listing Strategy 2026-04-20 · Positioning & ICP Review Data source: s3://agg-data.huskyai.com/agg-maid/ (Delta Lake, Parquet) · catalog: public/data/segments.json (300 branded) + segmentcount.csv (75,184 upstream wholesale rows, 202510 snapshot)

Confidentiality note (policy): Husky's upstream data suppliers are confidential — their names do not appear in this document or any other. If you see an segmentcount.csv-style file with a supplier prefix in the path_new column, treat that as raw source data not to be redistributed verbatim. All further derivative artifacts use Husky-owned naming only.

1. Data inventory confirmed 2026-04-20¶

Asset	Location	Scale	Format	Refresh	Fit for DX
`agg-maid`	`s3://agg-data.huskyai.com/agg-maid/date=YYYYMM/country=XXX/platform=android\\|ios/`	JPN/android 202602 alone = 110.77M rows / 211 GB / 1,412 files	Delta Lake (Parquet)	Monthly	✅ Flagship
`agg-hem`	same bucket	similar scale	Delta Lake	Monthly	✅ Phase 2
`agg-phone`	same bucket	similar scale	Delta Lake	Monthly	✅ Phase 2
`agg-ip`	mentioned by CEO, not yet inspected	TBD	TBD	TBD	🟡 Phase 2
Husky Data branded catalog	`public/data/segments.json` (repo)	300 segments × 14 verticals × Asia × 30d lifetime	JSON	Manual curation	✅ Product descriptors
Upstream wholesale catalog	`segmentcount.csv` (repo, 202510 snapshot)	75,184 segment×country rows across 14 countries, with per-DSP CPM pricing on 13 distribution platforms	CSV	Unknown — snapshot only in repo	✅ Wholesale product
ActiveData	`/Users/benjaminwong/Downloads/counts/` sample	143 columns × 320,905 rows in one file	Parquet (Snappy)	TBD	🔴 Private-offer only (contains ethnicity, religion, political, credit_rating, net_worth — GDPR Art.9 / PDPA / CCPA sensitive PI)

Schema — `agg-maid` Delta table¶

user_id      string    -- MAID (android AAID or iOS IDFA)
seg_list     string    -- comma-separated integer segment IDs, e.g. "119217,119254,119256"
country      string    -- partition, ISO3 (14 values: AUS HKG IDN IND JPN KOR MYS NZL PHL SGP THA TWN USA VNM)
platform     string    -- partition, "android" or "ios"

Path: s3://agg-data.huskyai.com/agg-maid/date=YYYYMM/country=XXX/platform=YYY/*.parquet

Namespace / brand split¶

⚠️ SUPPLIER CONFIDENTIALITY (policy flagged 2026-04-20, CEO, updated 2026-04-21): Husky's upstream data suppliers are confidential and their names must NOT appear anywhere Husky writes — buyer-facing OR internal. Buyer-facing materials (segment names, methodology PDF, schema docs, AWS DX / Snowflake listing copy, free sample files) obviously. Internal docs too: we don't want staff to mentally anchor on a single supplier, and we don't want accidental leakage via shared drives, contractor access, or screen shares. Raw source files (e.g. segmentcount.csv) may arrive with supplier-prefixed names in their path_new column — those are the raw source artifacts, not to be propagated. All derivative artifacts scrub the prefix.

Why strict today, revisitable later (CEO note 2026-04-21): The rule is strictest right now because the primary upstream supplier effectively is our single biggest source — so revealing any one supplier name carries disproportionate leverage. As Husky's supplier portfolio diversifies and no single name carries high single-source leverage, this internal-documentation restriction can be relaxed. External / buyer-facing confidentiality stays tight regardless — that's governed by supplier contracts, not our comfort level.

Husky Data branded (300 segments): curated APAC catalog, public-facing at huskydata.io/segments, compliance-forward naming with no third-party brand. Already Husky-owned naming — no rebrand needed.
Upstream wholesale (65-70K segments): the production source catalog arrives with supplier-prefixed names in the raw CSV. Segment IDs span namespaces 80xxxxxx, 10xxxxxx, 11xxxxxx, 14xxxxxx, 15xxxxxx, 16xxxxxx, 30xxxxxx, 40xxxxxx, 50xxxxxx, 60xxxxxx. All derived artifacts strip the prefix and ship under Husky taxonomy only.
Third-party-enhanced sub-catalog (~5-10K segments): a separate license track from our primary upstream source. Per-segment rebrand AND legal review of redistribution rights required before any listing. Default: exclude from Phase 2 wholesale product until license terms confirm redistribution permitted.
Legacy partner-branded segments (historical hotmob_segment column of ActiveData): another confidential partner brand. Consolidate under Husky or retire — never publish under the legacy partner name externally.

2. AWS DX product line-up¶

Product 1 — Husky Data APAC Mobile Audience — Sample (Free)¶

Field	Value
Product type	AWS Data Exchange free product
Price	$0 / month
Data format	CSV + Parquet (both shipped in single revision)
Contents	Synthetic 1,000-row sample + segment catalog + methodology PDF + schema doc
Refresh	Quarterly (methodology + catalog updates)
Purpose	Procurement pre-eval; buyer validates ingestion + data shape before committing to paid

Revision contents: 1. husky_agg-maid_sample_202602_SGP_android.csv — 1,000 synthetic UUIDv4 user_ids × real Husky segment IDs randomly assigned. Schema matches production exactly. 2. husky_agg-maid_sample_202602_SGP_android.parquet — same as above in Parquet for pipeline validation 3. husky_segment_catalog_branded_300.csv — all 300 branded Husky Data segments with IAB Taxonomy 1.1 mapping (columns: husky_name, husky_slug, vertical, sub_vertical, iab_tier_1, iab_tier_2, iab_tier_3, iab_tier_4, typical_30d_reach_apac, data_sources, description) 4. husky_schema.md — field definitions, partition layout, refresh cadence, exact S3 path pattern 5. husky_methodology.pdf — data pipeline overview (5 supplier feeds → processing → segments), IAB Taxonomy 1.1 + Data Transparency 1.2 conformance, compliance coverage (CCPA, GDPR, PDPA SG/HK/JP/TH/KR/MY/PH), refresh cadence 6. husky_data_transparency.json — IAB Tech Lab Data Transparency Standard 1.2 compliant metadata

Hero country for synthetic rows: Singapore (SGP) / Android — aligns with HQ narrative. Add JPN and USA in quarterly revisions if buyer demand.

Product 2 — Husky Data APAC Mobile Audience — 300 Branded Segments (Paid)¶

Field	Value
Product type	AWS Data Exchange paid product
Pricing model	Monthly subscription tiered by country scope
Pricing — Single country	$3,500 / month (any 1 of 14 countries)
Pricing — APAC-7 bundle	$15,000 / month (SGP, HKG, JPN, KOR, TWN, AUS, NZL — Tier-1 APAC)
Pricing — APAC Full (13)	$25,000 / month (all APAC incl. IDN, IND, MYS, PHL, THA, VNM)
Pricing — Global (14 incl. USA)	$35,000 / month
Free trial	14 days, single country only, via AWS DX free-trial mechanism
Private offers	Custom bundles, annual commits, volume discounts
Data format	Parquet (Delta Lake-compatible, with `_delta_log/`)
Refresh	Monthly revisions (calendar month close +5 business days)
Backfill	24-month rolling lookback on initial subscription
Contents per revision	`agg-maid/date=YYYYMM/country=XXX/platform=android\\|ios/*.parquet`, filtered to the 300 branded segment IDs only

Revision cadence: One revision per calendar month. Partitioning in revision mirrors production path pattern so buyers can point Athena/Databricks/Snowflake directly at the mount.

Product 3 — Husky Data APAC+US Wholesale — ~65K Segments (Paid, Private Offer Tier)¶

Field	Value
Product type	AWS Data Exchange paid product — private offer only (not self-serve)
Pricing	Negotiated; benchmark $100-250k/year enterprise, $0.15-0.80 CPM on DSP onboard
Target buyers	DSPs, DMPs, CDP vendors, attribution platforms, programmatic agencies
Data format	Parquet
Refresh	Monthly
Contents	Rebranded wholesale catalog across 14 countries, supplier-identifying names stripped and replaced with Husky taxonomy. Third-party-enhanced sub-catalog EXCLUDED from default product — separate legal track to confirm redistribution rights before inclusion. Expected scope after that exclusion: ~65-70K segments (down from the 75,184-row internal snapshot).
Gating	Signed DPA + NDA required before onboarding. Branded naming must be finalized via rebrand pipeline (§3b) before publish.
Brand	Marketed as "Husky Data Wholesale" or similar — supplier names never appear in buyer-facing materials

Product 4 — Husky Data APAC — HEM / Phone / IP Channel Products (Paid, Phase 2 Q3 2026)¶

One DX product per identifier channel (same segment catalog, different identity primitive). Same pricing ladder as Product 2.

Product 5 — ActiveData Enriched Profile (Paid, Private Offer + DPA, Phase 3 Q4 2026+)¶

Field	Value
Product type	Private offer only, NOT publicly listed
Pricing	$50k-200k / year per customer, per-country scope
Target buyers	Identity resolution vendors, CDP vendors, large enterprise direct
Data shape	Full 143-column profile (subset tailored per customer based on license scope)
Compliance	Signed DPA with sensitive-category Article 9 basis, customer allowlist maintained, per-regime sensitive-data carve-outs (e.g., no ethnicity/religion/political to customers outside OECD, no credit_rating to non-FCRA-covered use cases)

3. Bridge table — branded ↔ wholesale IDs (GAP, needs build)¶

Problem: public/data/segments.json has 300 branded segment names but no segment_id. segmentcount.csv has 75,184 segment_id values with supplier-prefixed names in path_new. Need a mapping table: husky_branded_slug → [source_segment_ids].

Proposed bridge format (CSV, versioned in repo as public/data/segment_bridge.csv):

husky_slug,husky_name,vertical,alikeaudience_segment_ids,countries_available,iab_tier_1,iab_tier_2,iab_tier_3,iab_tier_4,notes
travel-intent-luxury-hotels,APAC Husky Data - Travel - Intent - Luxury Hotels,Travel,"80010127,80010128,80010129",AUS;JPN;KOR;SGP;HKG;TWN,Travel,Accommodations,Hotels,Luxury Hotels,
auto-intent-luxury-cars,APAC Husky Data - Auto - Intent - Luxury Cars,Auto,"80020045,80020046",AUS;JPN;KOR;SGP;HKG;TWN;AUS,Automotive,Auto Buying & Selling,Luxury Vehicles,,
...

Build approach: 1. Auto-match by name similarity between the two catalogs (husky "Auto - Intent - Luxury Cars" ↔ internal wholesale "Australia > Intent > Auto > Luxury (All)"). Internal match uses full internal names incl. supplier prefix; the supplier prefix is purely a matching aid and never surfaces downstream. 2. Human review of fuzzy matches — CEO or Kelvin signs off 3. One-time build; only needs update when branded catalog changes 4. Bridge table stays internal (docs/internal/ not public/data/) — contains supplier segment IDs that are proprietary even though the buyer-facing names are Husky-owned

3b. Wholesale rebrand pipeline (GAP, needs build)¶

Problem: Raw source catalog names arrive with supplier brand in the name (e.g. <SUPPLIER>: Singapore > Intent > Travel > Luxury Hotels (All)). These names are plumbed through the path_new column of segmentcount.csv. If shipped to buyers verbatim they leak supplier identity; if propagated into internal derivative files they create retention risk. On ingest, strip the prefix and store only the de-branded path.

Solution: Deterministic rebrand transformer that maps internal names → Husky-owned names, applied to every buyer-facing artifact.

Transformer rules (first draft):

"<SUPPLIER_PREFIX>: <Country> > <Category> > <Sub> > <Leaf> (All)"
  →  "Husky Data | <Country> | <Category> | <Sub> | <Leaf>"

"<THIRD_PARTY_ENHANCED_PREFIX>: <Country> > ..."
  →  EXCLUDE (separate legal track)

"<SUPPLIER_PREFIX>: <Country> > Intent > Auto > Luxury (All)"
  →  "Husky Data | <Country> | Automotive | Luxury Vehicle Intenders"

# Leaf node vocabulary normalized to Husky brand guide terms:
#   "Intent → <Noun>"        → "<Noun> Intenders"
#   "Interest → <Noun>"      → "<Noun> Enthusiasts"
#   "Location Visited → ..." → "<Brand/Category> Visitors"
#   "Past Purchase → ..."    → "<Category> Purchasers"
#   "Demographics → ..."     → kept as-is, no audience-intent suffix

Buyer-facing artifacts governed by this pipeline: - AWS DX listing metadata (title, description, asset filenames) - Revision files (Parquet path_new / segment-name columns must be rewritten in outgoing revisions) - Segment catalog CSVs shipped to buyers - Methodology PDF - Free sample files - Schema docs

Build effort: 1-2 days (transformer script + unit tests + legal review of normalized taxonomy), applied offline before each monthly DX revision.

Owner: Product engineering.

Safety net: a pre-publish (and pre-commit) checker that greps artifacts for literal supplier names on a maintained blocklist (the list itself lives in a restricted config file, not in this doc); refuses publish OR commit if any hit. Applies to both buyer-facing deliverables and internal documentation.

4. IAB Taxonomy 1.1 mapping (GAP, needs build)¶

All 300 branded segments need an IAB Audience Taxonomy 1.1 tier-1/2/3/4 assignment. Current segments.json has vertical + subVertical but these are Husky-internal, not IAB.

Mapping approach: - Source: IAB Audience Taxonomy 1.1 official spec - Husky vertical → IAB tier 1 (e.g., "Auto" → "Automotive", "Travel" → "Travel") - Husky subVertical + segment name → IAB tier 2/3/4 - Store mapping in the bridge table (one column per IAB tier)

Estimated effort: 4-6 hours (300 segments × manual IAB assignment with taxonomy reference).

5. Pre-publish gates (blockers)¶

#	Gate	Owner	Status
1	Marketplace seller Business Verification (KYC Step 1)	Benjamin / AWS	Not started — must complete before any product can publish (free or paid)
2	Business Location HK → SG correction	AWS support ticket	In flight (opened 2026-04-20)
3	Bank account micro-deposit verification	OCBC + AWS	Pending micro-deposit arrival (1-5 business days)
4	Public profile propagation (display name, logo)	AWS	Propagating (up to 30 min post-edit)
5	Build branded↔wholesale bridge table (§3) — kept internal	Product	Not started
6	Build IAB Taxonomy 1.1 mapping for 300 segments (§4)	Product	Not started
7	Generate synthetic sample CSV + Parquet	Product	Spec'd in §2 Product 1
8	Write methodology PDF (no supplier brands)	Product / Legal	Not started
9	Verify third-party-enhanced sub-catalog redistribution rights for DX	Legal	Not started — gates Product 3 inclusion of that subset
10	Legacy partner-branded segment namespace decision (consolidate into Husky / retire / keep internal-only)	CEO	Open
11	Build wholesale rebrand transformer (§3b)	Product Eng	Not started — gates Product 3
12	Build pre-publish supplier-name leakage checker	Product Eng	Not started — safety gate on all outgoing DX artifacts
13	Audit all existing public/buyer-facing copy for supplier-brand mentions (website, methodology PDFs, sales decks, public GitHub repos, IAB DTL submission)	Marketing + Legal	Not started — any pre-existing leakage must be scrubbed

6. Phase timeline¶

Phase	Window	Output
Phase 0 — Prerequisites	2026-04-20 → ~2026-05-05	Close gates 1-4
Phase 1 — Soft launch	~2026-05-06 → 2026-05-20	Publish Product 1 (free sample) + Product 2 (branded, SGP-only as pilot)
Phase 2 — Expansion	2026-06 → 2026-07	Extend Product 2 to full 14 countries; launch HEM/Phone/IP variants; soft-launch Product 3 (wholesale) in private-offer mode
Phase 3 — Enterprise	2026-Q4	Selective ActiveData pilots (Product 5) under DPA

7. Audit trail¶

2026-04-20 — Doc created. Data inventory confirmed via S3 + local-repo inspection. agg-maid schema parsed from Delta log (no actual MAID data read). 300 branded segments counted from public/data/segments.json; 75,184 wholesale rows counted from segmentcount.csv. ActiveData 143-col schema parsed from local Parquet sample (no actual data read).
2026-04-20 — CEO flagged supplier-confidentiality constraint: upstream supplier names must NEVER appear in buyer-facing materials. Spec updated — Product 3 renamed "Husky Data Wholesale", §3b rebrand pipeline added, pre-publish leakage checker added to gates, audit of existing public surface area added.
2026-04-21 — Confidentiality policy tightened: supplier names must not appear in INTERNAL docs either (defense in depth + cultural — don't anchor on a single supplier). All prior references in this doc rewritten to neutral terms ("upstream source", "third-party enhanced sub-catalog", "legacy partner-branded"). Bridge table CSV (docs/internal/segment_bridge_draft.csv) rewritten with neutral column names (source_segment_id, source_path, source_country) and path_new prefix scrubbed from values. Task #54 expanded to include pre-commit git hook + internal docs in scope.
Next review: when KYC Step 1 clears, when bridge table built, when rebrand transformer built, when public-surface audit complete.