Well-Architected Review - Credit Card Entry with Checkout Service Store
- Overview
- 1. Operational Excellence
- 2. Security
- 3. Reliability
- 4. Performance Efficiency
- 5. Cost Optimization
- 6. Store Reporting
- 7. Post-Purchase Operations
- 8. Sustainability
Disclaimer: This document contains sample content for illustrative purposes only. Organizations should follow their own established best practices, security requirements, and compliance standards to ensure solutions are production-ready.
Overview
This questionnaire is designed for Just Walk Out store implementations that use credit card entry with the Amazon Checkout Service API for payment processing. In this model, Amazon manages shopper identity at the gate, while the retailer handles charge calculations through the Order Delegation model. Once the cart is priced, the retailer calls Amazon through the Checkout Service API to charge the shopper's payment instrument. The following APIs are in scope:
- Create Purchases API (
POST /v1/order/purchases) — Order Delegation - Checkout Service API — Retailer calls Amazon to charge the shopper's payment instrument
1. Operational Excellence
1.1 Charge Calculation and Order Delegation Operations
- How do you monitor Create Purchases API success/failure rates and response times?
- How do you validate that charge calculations return accurate pricing, promotions, and tax for each cart?
- What monitoring detects charge calculation latency spikes or failures?
- How do you handle carts with unidentifiable SKUs routed to the bad cart process?
- How do you handle empty carts (return empty purchaseId, trigger pre-auth cancellation)?
- How do you ensure all requests are handled idempotently using the idempotentShoppingTripId?
- What alerting is in place when carts are received but not priced before the pre-auth expiration window?
- What is your process for updating pricing rules, promotions, or tax configurations in your POS system?
- How do you handle different cart item types (SKU, SCANCODE) and quantity units (unit, weight-based)?
- How do you process group shopping trips with multiple authEvents?
1.2 Checkout Service Operations
- How do you monitor Checkout Service API availability and response times?
- What alerting is in place when the Checkout Service returns errors or degraded performance?
- How do you track the end-to-end payment lifecycle managed by the Checkout Service?
- What dashboards provide visibility into Checkout Service transaction volumes and success rates?
- How do you handle Checkout Service outages or degradation?
1.3 Retry and Error Handling
- How do you handle incoming Create Purchases API requests that fail due to internal processing errors (e.g., POS lookup failure, pricing engine timeout)?
- How do you ensure Create Purchases idempotency using the idempotentShoppingTripId when Amazon retries the call to your endpoint?
- What retry strategy with exponential backoff is implemented for outbound Checkout Service API calls?
- How do you handle 429 (Too Many Requests) responses from the Checkout Service with the Retry-After header?
- How do you handle 503 (ServiceUnavailable) responses from the Checkout Service with the retryAfter value?
- What alerting is in place when maximum retry attempts are exhausted for Checkout Service calls?
- What is the escalation process when retries fail for payment-critical Checkout Service operations?
1.4 Observability, Deployment, and Readiness
- How do you implement observability across the order delegation and checkout flow?
- How do you mitigate deployment risks for ordering connector changes?
- How do you know that you are ready to support the workload?
1.5 Monitoring and Change Management
- How do you monitor workload resources?
- How do you implement change to charge calculation logic?
1.6 Change Management Readiness
1.6.1 Change Control Process
- What formal change management process governs modifications to the Ordering Connector and supporting infrastructure?
- Who approves changes to production systems and what is the approval workflow?
- How do you classify changes by risk level (standard, normal, emergency)?
- How do you maintain a change log that records all modifications, approvers, and deployment timestamps?
1.6.2 Charge Calculation and Order Delegation Changes
- What is the process for deploying pricing rule, promotion, or tax configuration changes to the Ordering Connector?
- How do you ensure pricing changes are synchronized between your POS system and the Create Purchases API?
- What validation confirms that charge calculation changes produce correct totals before production deployment?
- How do you handle promotion activation/deactivation without impacting in-flight shopping trips?
- What is the rollback procedure if a pricing or tax change produces incorrect charges?
1.6.3 Infrastructure and API Changes
- What is the process for updating API Gateway configurations, Lambda functions, or IAM policies?
- How do you deploy infrastructure changes without service interruption?
- What blue/green or canary deployment strategies are used for API changes?
- How do you handle Amazon-initiated API changes (new fields, deprecations) in your Ordering Connector?
- What is the process for updating IAM role permissions or rotating API credentials?
1.6.4 Testing and Validation
- What pre-deployment testing is required for all change types (unit, integration, E2E, load)?
- How do you validate changes in a staging environment that mirrors production before deployment?
- What smoke tests confirm system health immediately after a production deployment?
- How do you test changes against the full shopper journey (entry → shopping → exit → charge)?
1.6.5 Rollback and Recovery
- What is the maximum acceptable rollback time for each component (Ordering Connector, infrastructure)?
- How do you ensure every deployment is reversible and what automated rollback triggers are in place?
- What is the communication plan when a rollback is initiated during store operating hours?
- How do you handle data inconsistencies that may result from a partial deployment or rollback?
1.6.6 Communication and Coordination
- How do you communicate planned changes to stakeholders (store operations, Amazon team)?
- What maintenance windows are defined and how are they communicated to affected parties?
- What post-deployment review process captures lessons learned from each change?
1.7 Incident Management
- What is your incident classification framework (severity levels, impact criteria, response time SLAs)?
- What on-call rotation and escalation procedures are in place for checkout and charge calculation incidents?
- How do you detect incidents (automated alerting, customer reports, Amazon notifications)?
- What is the communication plan during an active incident (internal stakeholders, store operations, Amazon team, shoppers)?
- What incident commander or response team structure is activated during a major incident?
- How do you coordinate with Amazon during incidents that involve both retailer and Amazon systems?
- What war room or bridge call procedures exist for critical incidents affecting shopper experience?
- How do you track incident timelines (detection, acknowledgment, mitigation, resolution)?
- What post-incident review (PIR) process captures root cause, contributing factors, and corrective actions?
- How do you track corrective action items to completion after a post-incident review?
- What metrics track incident frequency, mean time to detect (MTTD), mean time to resolve (MTTR), and recurrence rate?
- How do you conduct incident response drills or game days to validate readiness?
- What process ensures lessons learned from incidents are incorporated into runbooks and monitoring?
2. Security
2.1 Charge Calculation Data Security
- How is cart data (item SKUs, quantities, pricing, shopper identity) protected in transit and at rest?
- What input validation prevents injection attacks through malformed cart payloads?
- How do you ensure sensitive shopper data is not logged or exposed in Create Purchases API error messages?
- What audit trail exists for all charge calculation requests and responses?
2.2 Payment Data Security
- How is payment data handled given that Amazon manages the Checkout Service?
- What PCI DSS compliance responsibilities remain with the retailer in this model?
- What tokenization or masking strategies are used when displaying payment data?
2.3 API Authentication and Authorization
- How are API credentials for the Create Purchases API and Checkout Service API managed and rotated?
- What controls prevent unauthorized access to these APIs?
- How do you detect and respond to abnormal API usage patterns?
- What role-based access controls govern which systems can invoke each API?
2.4 Security Events, Data Classification, and Incident Response
- How do you detect and investigate security events?
- How do you classify your data?
- How do you protect data at rest?
- How do you anticipate, respond to, and recover from incidents?
3. Reliability
3.1 Charge Calculation Reliability
- What is the target availability SLA for the Create Purchases API (Ordering Connector)?
- How do you handle empty carts where no pricing is required and a pre-auth cancellation must be triggered?
- What happens when a cart contains an item SKU that cannot be identified (bad cart process)?
- How do you ensure charge calculations complete before the pre-auth window expires?
- How do you handle idempotent retries using the idempotentShoppingTripId without creating duplicate purchase records?
- What is the recovery process when the Ordering Connector returns intermittent 500 errors?
3.2 Checkout Service Reliability
- What is the target availability SLA for the Checkout Service API?
- What fallback behavior exists when the Checkout Service is unavailable?
- How do you handle scenarios where the Checkout Service processes payment but confirmation is not received?
- What reconciliation process detects and resolves incomplete checkout transactions?
3.3 End-to-End Resilience
- What is the expected end-to-end latency from cart receipt to completed checkout?
- How do you handle cascading failures across the order delegation → checkout pipeline?
- What circuit breaker patterns are implemented to prevent system overload?
- How do you handle concurrent operations on the same shopping trip (race conditions)?
3.4 Data Protection and Fault Tolerance
- How do you back up data?
- How do you design your workload to withstand component failures?
3.5 Backup and Recovery
- What is the backup strategy for charge calculation configuration (pricing rules, promotions, tax rates) and purchase records?
- What is the Recovery Point Objective (RPO) for pricing configuration and transaction data?
- What is the Recovery Time Objective (RTO) for restoring the Ordering Connector after a failure?
- How do you validate that backups are complete, consistent, and restorable through regular restore testing?
- How do you ensure backups are stored in a separate AWS region or account for disaster recovery?
- What is the escalation process when automated recovery fails?
- How do you conduct disaster recovery drills and how frequently are they performed?
4. Performance Efficiency
4.1 Charge Calculation Performance
- What is the p99 response time for Create Purchases API calls?
- How does calculation performance scale with cart complexity (number of items, promotions, tax categories)?
- What optimizations are in place for high-volume concurrent charge calculations?
- How does the system handle large carts (many line items, weight-based items, multiple external identifiers)?
4.2 Checkout Service Performance
- What is the p99 response time for Checkout Service API calls?
- How does the Checkout Service perform under peak load (e.g., many concurrent checkouts)?
- What monitoring tracks Checkout Service latency and throughput?
4.3 Rate Limiting
- How does the system handle 429 responses with the Retry-After header?
- What queuing or throttling strategies prevent hitting rate limits during peak periods?
- How do you distribute API calls across time windows?
4.4 Demand Management
- How do you design your workload to adapt to changes in demand?
5. Cost Optimization
5.1 Compute and Infrastructure
- How are compute resources scaled for charge calculation services?
- What auto-scaling policies handle peak vs. off-peak traffic?
- Are there opportunities to use reserved capacity or savings plans for predictable workloads?
5.2 Checkout Service Costs
- What is the cost per transaction through the Amazon Checkout Service?
- How do you track and forecast Checkout Service costs based on transaction volumes?
- What is the cost impact of failed or cancelled checkout transactions?
5.3 API and Data Transfer Costs
- What is the total cost per shopping trip across all API calls (Create Purchases, Checkout Service)?
- How do you minimize unnecessary API calls?
- What is the cost impact of retry logic across all APIs?
6. Store Reporting
6.1 Reporting Mode Selection
- Have you evaluated which reporting mode best fits your organization (Merchant Portal daily reports, Intra-day S3 reporting, Event feed via EventBridge)?
- What is your required frequency of data refresh (daily, hourly, every 15 minutes, near real-time)?
- Does your existing data ingestion infrastructure support CSV-based (Intra-day) or JSON/API-based (Event feed) formats?
- If managing multiple stores or merchant accounts, have you considered the Event feed solution for scalability?
6.2 Merchant Portal Reporting
- Are daily reports (Orders, Catalog, Payments) being downloaded and reviewed from the JWO Merchant Portal?
- How do you consume the dashboard data that refreshes every 30 minutes (sales details, item details)?
- What process exports and integrates Merchant Portal data into your internal reporting systems?
- Who is responsible for reviewing daily reports and what is the escalation process for anomalies?
6.3 Intra-Day Reporting Operations
- Have you onboarded to the Intra-day reporting solution (IAM role, SNS subscription, SQS queue, Lambda processor)?
- How do you monitor the 96 daily files (4 per hour) for completeness and timeliness?
- What de-duplication logic prevents processing the same report file multiple times?
- How do you handle orders that span multiple files using upsert (update/insert) logic?
- What alerting is in place when expected report files are not received within the 15-minute window?
- How do you handle KMS decryption failures when accessing report files from the Amazon S3 bucket?
6.4 Event Feed Reporting Operations
- Have you onboarded to the Event feed solution (EventBridge event bus, event rules, targets)?
- Are EventBridge rules configured correctly for CART and PAYMENT event types?
- How do you monitor EventBridge event delivery success and failure rates?
- What targets are configured for incoming events (S3, database, API endpoint)?
- How do you handle schema translation from Amazon event format to your internal reporting format?
- What alerting is in place when events are not received within expected timeframes after a shopping trip?
- How do you process PAYMENT event subtypes (AuthorizationApproved, CaptureApproved, AuthorizationDeclined, CaptureDeclined)?
- What dead-letter queue or retry strategy handles failed event processing?
6.5 Reporting Data Integrity
- How do you reconcile reporting data against charge calculation records and payment transactions?
- What validation ensures cart event data (SKUs, quantities, prices, promotions) matches your POS records?
- How do you detect and investigate discrepancies between Amazon reporting data and your internal systems?
- What process handles delayed orders that appear in later report files or event deliveries?
- How do you validate that promotion data in cart events (merchantPromotionId, promotionValue) matches your promotion configurations?
6.6 Reporting Security and Access
- How are IAM roles and KMS keys for reporting access managed and rotated?
- What access controls restrict who can view or download reporting data?
- How do you ensure PII in reporting data (card last four digits, shopper identity) is handled per privacy regulations?
- What audit logging captures all reporting data access and downloads?
7. Post-Purchase Operations
Amazon handles all post-purchase operations for this store model. Shoppers access the Just Walk Out receipt portal to view receipts, request refunds, and manage their shopping history. No retailer APIs or systems are required for post-purchase.
- Have you verified that shoppers can access the Just Walk Out receipt portal to view their shopping trips?
- Have you configured your store logo, contact information, and branding in the Merchant Portal for receipt display?
- What is the process for directing shoppers to the Just Walk Out receipt portal (in-store signage, email, website)?
- What is the escalation process when a shopper reports an issue with their receipt or refund through Amazon support?
8. Sustainability
8.1 Resource Efficiency
- How do you minimize compute usage during low-traffic periods across charge calculation services?
- What strategies reduce unnecessary data processing for empty carts?
- How do you optimize data retention policies for transaction logs and audit trails?
8.2 Data Lifecycle Management
- How do you optimize retention of transaction records and charge calculation data?
- What archiving strategies minimize long-term storage for completed transactions?
- How do you efficiently purge obsolete transaction data?
8.3 Network and Transfer Optimization
- How do you minimize network traffic through efficient API call patterns?
- What batching or caching strategies reduce redundant API calls?
- How do you optimize API payload sizes to reduce transfer overhead?

