Amazon Elastic Compute cloud (EC2) is an integral part
of Amazon’s cloud-computing platform known as ‘Amazon web Services’ (AWS). This
web service allows users to rent virtual computers instead of renting/buying
physical computers in order to run their own applications (Amazon 1, 2018). With
EC2, a user can launch many virtual servers, configure security and networking,
and manage storage (Amazon 1, 2018). Amazon’s server design provides attributes
such as state management, concurrency, replication, and client transparency.
These attributes are discussed below:
State Management: Amazon has a
secure and scalable configuration management service called ‘State Manager’
which ensures that the EC2 and hybrid infrastructure is in a client defined
consistent state (Amazon 2, 2018). According to Amazon (2018), State Manager
works as follows:
client determines the state he/she wants to apply to his/her managed instances.
client specifies a schedule for when/how often to apply the state. A cron or
rate expression can be specified.
client specifies the targets for the state.
client binds this information (schedule, targets, documents, parameters) to the
client sends the request to create an association, the status of the
association shows “Pending”. The system attempts to reach all targets and
immediately apply the state specified in the association.
Manager reports the status of the request for each instance targeted by the
the client creates the association, State Manager reapplies the state according
to the schedule defined in the association.
Concurrency: Amazon’s EC2 helps clients to control concurrency
for reasons such as cost, regulation of time to process a batch of events, and
so on (Amazon 3, 2018). Amazon provides a compute service called ‘Lambda’ that
lets users run code without provisioning or managing servers (Amazon 4, 2018).
Lambda provides a concurrent execution limit control at the account level and
level: At this level, AWS Lambda limits the total concurrent executions across
all functions within a given region to 1000 by default.
level: At this level, the concurrent execution limit is enforced against the
sum of the concurrent executions of all functions by default.
Replication: In order to maintain consistency and prevent loss of
‘Elastic Block Store’ (EBS) data due to failure of any component, Amazon’s EC2 replicates
EBS data across multiple servers in an Availability Zone (Amazon 5, 2018).
The EBS volumes are designed for an annual failure rate (AFR) of between 0.1% –
0.2%, where failure means a complete or partial loss of data, depending on the performance
and size of the volume (Amazon 5, 2018).
transparency: Amazon EC2 provides
a Vormetric Transparent Encryption that protects client’s data (Amazon 6,
2018). The solution encrypts data within a client’s AWS instances, provides integrated
key management, policy-based data access controls, and detailed Security
Intelligence information about data access patterns (Amazon 6, 2018). The
solution is easy to deploy and operate because it is transparent to
applications and to system management processes (Amazon 6, 2018). Data can
only be accessed by authorized users and processes; hence, policy can be
created to allow privileged users to manage systems without them having
visibility to the data (Amazon 6, 2018). The privileged users can access
encrypted data, while only being able to see ciphertext (Amazon 6, 2018).
Between April 21,
2011 and April 24, 2011, a segment of Amazon’s web service (AWS) failed,
causing sites like Quora, Reddit, Foursquare, Springpad, and Hootsuite to go offline
(Wikipedia 1, 2018). Netflix services were
able remain afloat despite the crash because of what Netflix refers to as it’s ‘Rambo
Architecture’. This Rambo architecture is a Netflix software architecture in
AWS which ensures that each system is capable of succeeding, no matter what (Gilbertson,
2018). Basically, Netflix designed for
failure; they designed around exception handling, clusters, redundancy, fault
tolerance, fall-back or degraded experience (Hystrix), and so on (Tseitlin, 2013).
Specifically, Netflix uses a tool
called ‘Chaos Monkey’ to identify groups of systems and randomly terminate one
of the systems in the group (Hochstein, 2017). The tool tests resilience by
intentionally disabling computers in Netflix production network to test how the
remaining systems respond to the outage (Wikipedia 2, 2018). Chaos Monkey is
part of a larger suite of tools called the ‘Simian Army’ which is designed to
simulate responses to different edge cases and system failure (Wikipedia 2, 2018).
The different Simian Army tools and their various use/importance include:
Monkey: This is the first tool developed by Netflix (Wikipedia 2, 2018). It