Well-Architected Review - App Entry with Checkout Service Store
- Overview
- 1. Operational Excellence
- 1.1 Identity Verification Operations
- 1.2 Charge Calculation and Order Delegation Operations
- 1.3 Checkout Service Operations
- 1.4 Mobile App Readiness
- 1.5 Retry and Error Handling
- 1.6 Observability, Deployment, and Readiness
- 1.7 Change Management Readiness
- 1.8 Incident Management
- 1.9 Partner Incident Collaboration
- 2. Security
- 3. Reliability
- 4. Performance Efficiency
- 5. Cost Optimization
- 6. Store Reporting
- 7. Post-Purchase Operations
- 8. Sustainability
Disclaimer: This document contains sample content for illustrative purposes only. Organizations should follow their own established best practices, security requirements, and compliance standards to ensure solutions are production-ready.
Overview
This questionnaire is designed for Just Walk Out store implementations that use app-based entry combined with the Amazon Checkout Service API for payment processing. In this model, the retailer manages shopper identity verification through a mobile app (Verify Identity Keys API), calculates pricing through Order Delegation (Create Purchases API), and submits the priced cart to Amazon via the Checkout Service API for payment and receipt generation. The following APIs are in scope:
- Verify Identity Keys API (
POST /v1/identity/identity-keys) — Shopper identity verification at the gate - Create Purchases API (
POST /v1/order/purchases) — Order Delegation - Checkout Service API (
POST /v1/checkout/carts) — Retailer calls Amazon to charge the shopper's payment instrument
1. Operational Excellence
1.1 Identity Verification Operations
- How do you monitor the success/failure rate of identity key verifications in real time?
- What runbooks exist for handling identity verification service degradation or outages?
- How do you track the ratio of successful entries vs. rejected entries to detect anomalies?
- What is your process for managing and rotating valid identity keys?
- How do you handle associate entry separately from shopper entry in monitoring and reporting?
1.2 Charge Calculation and Order Delegation Operations
- How do you monitor Create Purchases API success/failure rates and response times?
- How do you validate that charge calculations return accurate pricing, promotions, and tax for each cart?
- What monitoring detects charge calculation latency spikes or failures?
- How do you handle carts with unidentifiable SKUs routed to the bad cart process?
- How do you handle empty carts (return empty purchaseId, trigger pre-auth cancellation)?
- How do you ensure all requests are handled idempotently using the idempotentShoppingTripId?
- What alerting is in place when carts are received but not priced before the pre-auth expiration window?
- What is your process for updating pricing rules, promotions, or tax configurations in your POS system?
1.3 Checkout Service Operations
- How do you monitor Checkout Service API availability and response times?
- What alerting is in place when the Checkout Service returns errors or degraded performance?
- How do you track the end-to-end payment lifecycle managed by the Checkout Service?
- What dashboards provide visibility into Checkout Service transaction volumes and success rates?
- How do you handle Checkout Service outages or degradation?
1.4 Mobile App Readiness
- Do you have a mobile app available on iOS and Android that supports QR code generation for JWO store entry?
- How do you generate Scan Key Codes in the app per JWO specifications (JWO prefix, customer prefix, recognition token, timestamp)?
- What is the QR code refresh interval to ensure shoppers always present a valid, unexpired code at the gate?
- How do you handle QR code generation when the shopper's device is offline or has intermittent connectivity?
- What minimum app version enforcement strategy prevents shoppers from using outdated versions?
- How do you test QR code generation across different device types, OS versions, and screen sizes?
- What monitoring tracks app-side QR code generation success/failure rates and scan success rates at the gate?
- How do you handle shopper support when the app fails to generate a valid QR code (e.g., fallback entry method)?
1.5 Retry and Error Handling
- How do you handle incoming Verify Identity Keys API requests that fail due to internal processing errors?
- How do you handle incoming Create Purchases API requests that fail due to internal processing errors (e.g., POS lookup failure, pricing engine timeout)?
- How do you ensure Create Purchases idempotency using the idempotentShoppingTripId when Amazon retries the call to your endpoint?
- What retry strategy with exponential backoff is implemented for outbound Checkout Service API calls?
- How do you handle 429 (Too Many Requests) responses from the Checkout Service with the Retry-After header?
- How do you handle 503 (ServiceUnavailable) responses from the Checkout Service with the retryAfter value?
- What alerting is in place when maximum retry attempts are exhausted for Checkout Service calls?
- How does the gate behave during identity verification retries (remains closed until definitive response)?
1.6 Observability, Deployment, and Readiness
- How do you implement observability across the identity verification, order delegation, and checkout flow?
- How do you mitigate deployment risks for identity connector and ordering connector changes?
- How do you know that you are ready to support the workload?
1.7 Change Management Readiness
1.7.1 Change Control Process
- What formal change management process governs modifications to the Identity Connector, Ordering Connector, and supporting infrastructure?
- Who approves changes to production systems and what is the approval workflow?
- How do you classify changes by risk level (standard, normal, emergency)?
- How do you maintain a change log that records all modifications, approvers, and deployment timestamps?
1.7.2 Identity Verification Changes
- What is the process for updating identity key validation logic (e.g., new QR code format, new customer prefix)?
- How do you deploy changes to the Verify Identity Keys API without disrupting active shopper entry?
- What testing is required before modifying gate decision logic (200/401/500 response behavior)?
- How do you coordinate identity key rotation or expiry window changes with the mobile app team?
- What is the rollback procedure if an identity verification change causes unexpected gate closures?
1.7.3 Charge Calculation Changes
- What is the process for deploying pricing rule, promotion, or tax configuration changes to the Ordering Connector?
- How do you ensure pricing changes are synchronized between your POS system and the Create Purchases API?
- What validation confirms that charge calculation changes produce correct totals before production deployment?
- How do you handle promotion activation/deactivation without impacting in-flight shopping trips?
- What is the rollback procedure if a pricing or tax change produces incorrect charges?
1.7.4 Infrastructure and API Changes
- What is the process for updating API Gateway configurations, Lambda functions, or IAM policies?
- How do you deploy infrastructure changes without service interruption?
- What blue/green or canary deployment strategies are used for API changes?
- How do you handle Amazon-initiated API changes (new fields, deprecations) in your Identity and Ordering Connectors?
- What is the process for updating IAM role permissions or rotating API credentials?
1.7.5 Mobile App Changes
- What is the process for deploying mobile app updates that affect QR code generation or Scan Key Code format?
- How do you ensure backward compatibility when the app and Identity Connector are updated at different times?
- What forced-update or minimum-version strategy prevents shoppers from using outdated app versions?
- How do you coordinate app store release timelines with backend Identity Connector changes?
1.7.6 Testing and Validation
- What pre-deployment testing is required for all change types (unit, integration, E2E, load)?
- How do you validate changes in a staging environment that mirrors production before deployment?
- What smoke tests confirm system health immediately after a production deployment?
- How do you test changes against the full shopper journey (entry → shopping → exit → checkout)?
1.7.7 Rollback and Recovery
- What is the maximum acceptable rollback time for each component (Identity Connector, Ordering Connector, infrastructure)?
- How do you ensure every deployment is reversible and what automated rollback triggers are in place?
- What is the communication plan when a rollback is initiated during store operating hours?
- How do you handle data inconsistencies that may result from a partial deployment or rollback?
1.7.8 Communication and Coordination
- How do you communicate planned changes to stakeholders (store operations, Amazon team, mobile app team)?
- What maintenance windows are defined and how are they communicated to affected parties?
- How do you coordinate changes that span multiple teams (e.g., mobile app + Identity Connector + POS)?
- What post-deployment review process captures lessons learned from each change?
1.8 Incident Management
- What is your incident classification framework (severity levels, impact criteria, response time SLAs)?
- What on-call rotation and escalation procedures are in place for identity verification, charge calculation, and checkout incidents?
- How do you detect incidents (automated alerting, customer reports, Amazon notifications, gate failure patterns)?
- What is the communication plan during an active incident (internal stakeholders, store operations, Amazon team, shoppers)?
- What incident commander or response team structure is activated during a major incident?
- How do you coordinate with Amazon during incidents that involve both retailer and Amazon systems?
- How do you track incident timelines (detection, acknowledgment, mitigation, resolution)?
- What post-incident review (PIR) process captures root cause, contributing factors, and corrective actions?
- What metrics track incident frequency, mean time to detect (MTTD), mean time to resolve (MTTR), and recurrence rate?
- How do you conduct incident response drills or game days to validate readiness?
1.9 Partner Incident Collaboration
1.9.1 Ticket Intake and Triage
- When Amazon reports an issue, how does it enter your ticketing system? What system do you use?
- Is there a specific format or template required from Amazon to create a ticket on your side?
- Who receives incident reports from Amazon — a single POC, team inbox, or on-call rotation?
- What happens if the primary incident contact is unavailable?
- How do you classify and prioritize JWO incidents internally (severity levels, criteria such as revenue impact, number of stores affected, customer-facing impact)?
- What is your typical triage time from receiving a report to beginning investigation?
- Are there times (weekends, holidays, after-hours) when incident response is limited or unavailable?
1.9.2 Investigation and Diagnosis
- What data or identifiers do you need from Amazon to begin investigating an issue (e.g., store ID, transaction ID, session ID, connector ID, SKU, barcode)?
- What identifiers do NOT work in your systems that Amazon might provide?
- What tools and systems do you use to investigate connector, ordering, or checkout issues?
- Do you have access to connector logs on your side, or do you need Amazon to provide transaction-level data?
- What is your typical investigation workflow from receiving an incident report to identifying root cause?
- What are the top 3 recurring issue types you see from JWO stores?
- Do you have runbooks or documented procedures for common JWO-related issues?
1.9.3 Resolution and Fix Deployment
- Once root cause is identified, what does your fix and deployment process look like?
- How long does a typical fix take to deploy (by issue type: connector, catalog/SKU, store-specific)?
- Do you have staging/test environments for JWO fixes, or do fixes go directly to production?
- Are there fix types that require coordination with Amazon before deployment? If so, who do you contact?
- What does "resolved" mean on your side — how do you confirm resolution back to Amazon?
- Do you monitor after a fix to ensure it holds? What is the monitoring duration?
- What are your average resolution times for system/connector issues, catalog/SKU requests, and store-specific issues?
1.9.4 Cross-Organization Escalation
- How many escalation levels exist within your organization for JWO issues?
- What triggers an internal escalation?
- When you need something from Amazon to resolve an issue, how do you escalate to Amazon?
- Have there been incidents where you felt blocked by Amazon? What happened and how was it resolved?
- Is there a scenario where you would escalate directly to Amazon leadership? What would trigger that?
- Do you have SLAs or internal targets for resolution time on JWO-related issues?
1.9.5 Communication Preferences
- What is your preferred communication channel for incident management (email, shared Slack, ticketing portal, phone, other)?
- Does the preferred channel differ by severity (e.g., phone for critical, email for routine)?
- Do you prefer one thread per issue, or is it acceptable to have multiple threads for multi-store problems?
- What cadence of updates do you expect from Amazon on open issues (daily, weekly, only on status change)?
- What cadence of updates can Amazon expect from you?
- Is there a preferred format for incident reports from Amazon (structured template, specific fields required)?
- Who should be CC'd or included on incident communications from your side?
1.9.6 Tooling and Access
- Do you have visibility into Amazon's ticketing system or any shared incident dashboard? Would shared visibility be helpful?
- Do you have connector access or admin access to the JWO integration? Is there confusion about what access you have vs. what you need?
- Are there self-service tools you wish existed that would reduce your dependency on Amazon for resolution?
- Do you have monitoring and alerting on your side for the JWO connector — do you know before Amazon tells you when something is down?
- What does your alerting configuration look like for JWO-related services?
2. Security
2.1 Identity Key Security
- How are identity keys encrypted in transit and at rest?
- What controls prevent replay attacks using previously used identity keys?
- How do you detect and prevent tampered or forged identity keys?
- What is the expiration and rotation policy for identity keys?
2.2 Entry Gate Security
- How do you ensure the gate remains closed when identity verification fails or is inconclusive?
- What physical and logical controls prevent unauthorized store entry?
- How do you handle scenarios where the identity verification service is unreachable?
- What audit trail exists for all entry and exit events?
2.3 Charge Calculation Data Security
- How is cart data (item SKUs, quantities, pricing, shopper identity) protected in transit and at rest?
- What input validation prevents injection attacks through malformed cart payloads?
- How do you ensure sensitive shopper data is not logged or exposed in Create Purchases API error messages?
- What audit trail exists for all charge calculation requests and responses?
2.4 Payment Data Security
- How is payment data handled given that Amazon manages the Checkout Service?
- What PCI DSS compliance responsibilities remain with the retailer in this model?
- What tokenization or masking strategies are used when displaying payment data?
2.5 API Authentication and Authorization
- How are API credentials for the Verify Identity Keys, Create Purchases, and Checkout Service APIs managed and rotated?
- What controls prevent unauthorized access to these APIs?
- How do you detect and respond to abnormal API usage patterns (e.g., brute force identity key attempts)?
- What role-based access controls govern which systems can invoke each API?
2.6 Security Events, Data Classification, and Incident Response
- How do you detect and investigate security events?
- How do you classify your data?
- How do you protect data at rest?
- How do you anticipate, respond to, and recover from incidents?
3. Reliability
3.1 Identity Verification Availability
- What is the target availability SLA for the identity verification service?
- What fallback behavior exists when the identity verification service is unavailable?
- How do you handle failed entry followed by a successful retry (no duplicate processing)?
- What is the recovery process when the identity service returns intermittent errors?
3.2 Charge Calculation Reliability
- What is the target availability SLA for the Create Purchases API (Ordering Connector)?
- How do you handle empty carts where no pricing is required and a pre-auth cancellation must be triggered?
- What happens when a cart contains an item SKU that cannot be identified (bad cart process)?
- How do you ensure charge calculations complete before the pre-auth window expires?
- How do you handle idempotent retries using the idempotentShoppingTripId without creating duplicate purchase records?
3.3 Checkout Service Reliability
- What is the target availability SLA for the Checkout Service API?
- What fallback behavior exists when the Checkout Service is unavailable?
- How do you handle scenarios where the Checkout Service processes payment but confirmation is not received?
- What reconciliation process detects and resolves incomplete checkout transactions?
3.4 End-to-End Resilience
- What is the expected end-to-end latency from identity verification to completed checkout?
- How do you handle cascading failures across the identity → order delegation → checkout pipeline?
- What circuit breaker patterns are implemented to prevent system overload?
- How do you handle concurrent operations on the same shopping trip (race conditions)?
3.5 Data Protection and Fault Tolerance
- How do you back up data?
- How do you design your workload to withstand component failures?
3.6 Backup and Recovery
- What is the backup strategy for identity verification configuration and charge calculation configuration (pricing rules, promotions, tax rates)?
- What is the Recovery Point Objective (RPO) for each critical data store (identity keys, pricing, transaction logs)?
- What is the Recovery Time Objective (RTO) for restoring each service after a failure?
- How do you validate that backups are complete, consistent, and restorable through regular restore testing?
- How do you ensure backups are stored in a separate AWS region or account for disaster recovery?
- What is the escalation process when automated recovery fails?
- How do you conduct disaster recovery drills and how frequently are they performed?
4. Performance Efficiency
4.1 Identity Verification Performance
- What is the p99 response time for identity key verification?
- How does the system perform under peak load (e.g., 100 concurrent identity verifications)?
- What is the acceptable gate open latency from scan to entry?
4.2 Charge Calculation Performance
- What is the p99 response time for Create Purchases API calls?
- How does calculation performance scale with cart complexity (number of items, promotions, tax categories)?
- What optimizations are in place for high-volume concurrent charge calculations?
4.3 Checkout Service Performance
- What is the p99 response time for Checkout Service API calls?
- How does the Checkout Service perform under peak load (e.g., many concurrent checkouts)?
- What monitoring tracks Checkout Service latency and throughput?
4.4 Rate Limiting
- What rate limits are configured for the Verify Identity Keys API?
- How does the system handle 429 responses from the Checkout Service with the Retry-After header?
- What queuing or throttling strategies prevent hitting rate limits during peak periods?
4.5 Demand Management
- How do you design your workload to adapt to changes in demand?
5. Cost Optimization
5.1 Compute and Infrastructure
- How are compute resources scaled for identity verification and charge calculation services?
- What auto-scaling policies handle peak vs. off-peak traffic?
- Are there opportunities to use reserved capacity or savings plans for predictable workloads?
5.2 Checkout Service Costs
- What is the cost per transaction through the Amazon Checkout Service?
- How do you track and forecast Checkout Service costs based on transaction volumes?
- What is the cost impact of failed or cancelled checkout transactions?
5.3 API and Data Transfer Costs
- What is the total cost per shopping trip across all API calls (Verify Identity Keys, Create Purchases, Checkout Service)?
- How do you minimize unnecessary API calls (e.g., caching identity resolutions)?
- What is the cost impact of retry logic across all APIs?
6. Store Reporting
6.1 Reporting Mode Selection
- Have you evaluated which reporting mode best fits your organization (Merchant Portal daily reports, Intra-day S3 reporting, Event feed via EventBridge)?
- What is your required frequency of data refresh (daily, hourly, every 15 minutes, near real-time)?
- Does your existing data ingestion infrastructure support CSV-based (Intra-day) or JSON/API-based (Event feed) formats?
- If managing multiple stores or merchant accounts, have you considered the Event feed solution for scalability?
6.2 Merchant Portal Reporting
- Are daily reports (Orders, Catalog, Payments) being downloaded and reviewed from the JWO Merchant Portal?
- How do you consume the dashboard data that refreshes every 30 minutes (sales details, item details)?
- What process exports and integrates Merchant Portal data into your internal reporting systems?
- Who is responsible for reviewing daily reports and what is the escalation process for anomalies?
6.3 Intra-Day Reporting Operations
- Have you onboarded to the Intra-day reporting solution (IAM role, SNS subscription, SQS queue, Lambda processor)?
- How do you monitor the 96 daily files (4 per hour) for completeness and timeliness?
- What de-duplication logic prevents processing the same report file multiple times?
- How do you handle orders that span multiple files using upsert (update/insert) logic?
- What alerting is in place when expected report files are not received within the 15-minute window?
- How do you handle KMS decryption failures when accessing report files from the Amazon S3 bucket?
6.4 Event Feed Reporting Operations
- Have you onboarded to the Event feed solution (EventBridge event bus, event rules, targets)?
- Are EventBridge rules configured correctly for CART and PAYMENT event types?
- How do you monitor EventBridge event delivery success and failure rates?
- What targets are configured for incoming events (S3, database, API endpoint)?
- How do you handle schema translation from Amazon event format to your internal reporting format?
- What alerting is in place when events are not received within expected timeframes after a shopping trip?
- How do you process PAYMENT event subtypes (AuthorizationApproved, CaptureApproved, AuthorizationDeclined, CaptureDeclined)?
- What dead-letter queue or retry strategy handles failed event processing?
6.5 Reporting Data Integrity
- How do you reconcile reporting data against charge calculation records and payment transactions?
- What validation ensures cart event data (SKUs, quantities, prices, promotions) matches your POS records?
- How do you detect and investigate discrepancies between Amazon reporting data and your internal systems?
- What process handles delayed orders that appear in later report files or event deliveries?
6.6 Reporting Security and Access
- How are IAM roles and KMS keys for reporting access managed and rotated?
- What access controls restrict who can view or download reporting data?
- How do you ensure PII in reporting data (card last four digits, shopper identity) is handled per privacy regulations?
- What audit logging captures all reporting data access and downloads?
7. Post-Purchase Operations
7.1 Receipt Generation and Delivery
- Have you determined the receipt ownership model for each entry method (Amazon-managed for credit card entry, retailer-managed for app entry)?
- For credit card entry shoppers: Have you configured your store logo, contact information, and branding in the Merchant Portal for Amazon-generated receipts?
- For app entry shoppers: How do you consume payment and cart data from Amazon APIs to build itemized receipts?
- How do you deliver receipts to app entry shoppers (email, mobile app push notification, in-app history)?
- How do you handle receipt delivery failures for app entry shoppers?
- What is the expected delivery time for receipts after a shopping trip for each entry method?
7.2 Receipt Lookup
- For credit card entry shoppers: Have you verified that shoppers can locate historical shopping trips via the Just Walk Out receipt portal?
- For app entry shoppers: How do you store and index receipt records for shopper self-service retrieval in your app?
- What is the data retention period for receipt records and how is it communicated to shoppers?
- What authentication is required for shoppers to access their receipt history?
7.3 Refund Processing
- Have you determined the refund ownership model for each entry method?
- For credit card entry shoppers: What is the process for directing shoppers to Amazon support for refunds?
- For app entry shoppers: Have you integrated and tested the Refund API (
POST /v1/refund) with all required fields? - How do you ensure refund idempotency using unique refundRequestId values?
- How do you enforce the 30-day refund window from the date of the shopping trip?
- What monitoring tracks refund success/failure rates and average refund processing time?
- What alerting is in place for abnormal refund rates that may indicate fraud or operational issues?
7.4 Bad Debt Remediation
- What process identifies bad debt scenarios (e.g., failed captures, expired pre-auths with uncharged carts)?
- How do you record and track bad debt incidents with sufficient detail (shoppingTripId, amount, failure reason)?
- What automated alerting triggers when a shopping trip results in bad debt?
- What thresholds trigger investigation when bad debt rates exceed acceptable levels?
- How do you prevent repeat bad debt from the same shopper?
- What reporting tracks bad debt metrics (total amount, frequency, recovery rate)?
8. Sustainability
8.1 Resource Efficiency
- How do you minimize compute usage during low-traffic periods across identity verification and charge calculation services?
- What strategies reduce unnecessary data processing for empty carts or rejected identity verifications?
- How do you optimize data retention policies for transaction logs, entry/exit events, and audit trails?
8.2 Data Lifecycle Management
- How do you optimize retention of transaction records, identity verification logs, and charge calculation data?
- What archiving strategies minimize long-term storage for completed transactions?
- How do you efficiently purge obsolete transaction data?
8.3 Network and Transfer Optimization
- How do you minimize network traffic through efficient API call patterns?
- What batching or caching strategies reduce redundant API calls?
- How do you optimize API payload sizes to reduce transfer overhead?

