Summary of the Security Design of the AWS Nitro System

While reading the whitepaper of security design of AWS nitro system recently i came across some design decisions which i really loved. White paper itself is quite detailed therefore I am sharing a summary of the whitepaper here for quick reading and possibly applying them where feasible in our work.

Table of Contents

  • What is Nitro System
  • Security Design Considerations
  • Conclusion

What is Nitro System ?

The AWS Nitro System is the underlying platform for all modern EC2 instances.

Nitro System virtualization architecture

Below diagram shows a simplistic view of the architecture. A modern EC2 server is made up of a main system board and one or more Nitro Cards.

1. Main System Board

  • The system main board contains the host CPUs (Intel, AMD, or Graviton processors) and memory. 

2. Nitro cards

  • Nitro Cards are dedicated hardware components with powerful 64-bit processing capabilities and specialized Application Specific Integrated Circuits (“ASICs”) that operate independently from the system main board that runs all customer compute environments, including code and data processing operations.
  • The Nitro Cards implement all the outward-facing control interfaces used by the EC2 service to provision and manage compute, memory, and storage. 
  • They also provide all I/O interfaces, such as those needed to provide software-defined networking, Amazon EBS storage, and instance storage. 
No alt text provided for this image

Security design considerations

Separation / Isolation

  • The Nitro System is designed to provide strong logical isolation between the host components (CPU and memory) and the Nitro Cards, and this physical isolation between the two provides a firm and reliable boundary which contributes to that design. 
  • While logically isolated and physically separate, Nitro Cards typically are contained within the same physical server enclosure as a host’s system main board and share its power supply, along with its PCIe interface.

2. No AWS operator access.

  • AWS decided not to provide any level of access to AWS operators.
  • In rare cases subtle issues can arise that, because there are no general access capabilities on our production hardware, AWS operators are unable to debug in-place. 
  • In those rare circumstances they work with customers, at their request, to reproduce those subtle issues on dedicated non-production Nitro debugging hardware.

3. High Degree of Isolation using Passive Communication design

  • The Nitro Controller never initiates any outbound communications on the EC2 control plane network. 
  • Even logical “push” features such as publishing CloudWatch metrics for the EC2 instances running on the host, or sending off the Nitro API logs to the EC2 control plane are implemented as a “pull” process. 
  • The control plane polls the Nitro Controller periodically to retrieve the metrics using well-defined APIs. 
  • Any attempt at outbound communication from the Nitro Controller would be a clear signal of a firmware bug or possible system compromise, for which the EC2 service is designed to react accordingly to prevent impact and alarm for operator response.

4. Strong and Robust Change management system

  • All Nitro System related configurations and code changes are subject to multi-party review and approval, and staged rollouts in both testing and production environments.

5. Side channel handling

  • Side channels are mechanisms that potentially allow revealing secret information in a computer system through the analysis of indirect data gathered from that system.
  • An example of such indirect data may be the amount of time it takes for a system to operate on an input. 
  • Careful deployment of countermeasures such as those employed by s2n-tls, the open-source SSL/TLS implementation from AWS, can be used to protect against these forms of side-channel data disclosure. More on this here.
  • s2n-tls incorporates and proves using formal methods time-balancing countermeasures to ensure that process timing is negligibly influenced by secrets, and therefore no attacker-observable timing behavior depends on secrets.

Conclusion

Above design decisions shows how one could think of securing complex systems where making difficult choices may be necessary but it should not block the other aspect of the business. If you want to deep dive more in to it refer to this aws whitepaper.