Amazon SimpleDB

From Bauman National Library
This page was last modified on 17 December 2015, at 11:46.
Revision as of 11:46, 17 December 2015 by alexander irbetkin (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Amazon SimpleDB is a distributed database written in Erlang by Amazon.com. It is used as a web service in concert with Amazon Elastic Compute Cloud (EC2) and Amazon S3 and is part of Amazon Web Services. It was announced on December 13, 2007. As with EC2 and S3, Amazon charges fees for SimpleDB storage, transfer, and throughput over the Internet. On December 1, 2008, Amazon introduced new pricing with Free Tier for 1 GB of data & 25 machine hours. Transfer to other Amazon Web Services is free of charge.[1].

Introduction

Amazon SimpleDB is a highly available and flexible non-relational data store that offloads the work of database administration. Developers simply store and query data items via web services requests and Amazon SimpleDB does the rest.

Unbound by the strict requirements of a relational database, Amazon SimpleDB is optimized to provide high availability and flexibility, with little or no administrative burden. Behind the scenes, Amazon SimpleDB creates and manages multiple geographically distributed replicas of your data automatically to enable high availability and data durability. The service charges you only for the resources actually consumed in storing your data and serving your requests. You can change your data model on the fly, and data is automatically indexed for you. With Amazon SimpleDB, you can focus on application development without worrying about infrastructure provisioning, high availability, software maintenance, schema and index management, or performance tuning.[2].

Benefits

Low touch

The service allows you to focus fully on value-added application development, rather than arduous and time-consuming database administration. Amazon SimpleDB automatically manages infrastructure provisioning, hardware and software maintenance, replication and indexing of data items, and performance tuning.

Highly available

Amazon SimpleDB automatically creates multiple geographically distributed copies of each data item you store. This provides high availability and durability – in the unlikely event that one replica fails, Amazon SimpleDB can failover to another replica in the system.

Flexible

As your business changes or application evolves, you can easily reflect these changes in Amazon SimpleDB without worrying about breaking a rigid schema or needing to refactor code – simply add another attribute to your Amazon SimpleDB data set when needed. You can also choose between consistent or eventually consistent read requests, gaining the flexibility to match read performance (latency and throughput) and consistency requirements to the demands of your application, or even disparate parts within your application.

Simple to use

Amazon SimpleDB provides streamlined access to the store and query functions that traditionally are achieved using a relational database cluster – while leaving out other complex, often-unused database operations. The service allows you to quickly add data and easily retrieve or edit that data through a simple set of API calls.

Designed for use with other Amazon Web Services

Amazon SimpleDB is designed to integrate easily with other AWS services such as Amazon S3 and EC2, providing the infrastructure for creating web-scale applications. For example, developers can run their applications in Amazon EC2 and store their data objects in Amazon S3. Amazon SimpleDB can then be used to query the object metadata from within the application in Amazon EC2 and return pointers to the objects stored in Amazon S3. Developers can also use Amazon SimpleDB with Amazon RDS for applications that have relational and non-relational database needs. Data transferred between Amazon SimpleDB and other Amazon Web Services within the same Region is free of charge.

Secure

Amazon SimpleDB provides an https end point to ensure secure, encrypted communication between your application or client and your domain. In addition, through integration with AWS Identity and Access Management, you can establish user or group-level control over access to specific SimpleDB domains and operations.

Inexpensive

Amazon SimpleDB passes on to you the financial benefits of Amazon’s scale. You pay only for resources you actually consume. For Amazon SimpleDB, this means data store reads and writes are charged by compute resources consumed by each operation, and you aren’t billed for compute resources when you aren’t actively using them (i.e. making requests).[2]

Featured Use Cases

Logging

Since Amazon SimpleDB allows you to completely offload the work required to run a production database, many developers find it an ideal, low-touch data store for logging information about conditions or events, status updates, recurring activities, workflow processes, or device and application states. Amazon SimpleDB lets you cost-effectively “set and forget” these data logs and use them for diverse purposes, such as:

  • Monitoring or tracking
  • Metering
  • Trend of business analysis
  • Auditing
  • Archival or regulation compliance

Application examples include::

  • Storing server logs centrally to reduce the space they consume on each running server
  • Logging operational metrics or the results of ongoing performance tests for later analysis
  • Auditing access entries or configuration changes for applications or networked devices
  • Capturing and monitoring environment conditions (temperature, pressure levels, humidity, etc.) at various locations and programming alerts for particular conditions
  • Logging and tracking geolocation information about objects or process status for activities in a workflow

Central, with High Availability: If your data logs were previously being trapped locally in multiple devices/objects, applications, or process silos, you’ll enjoy the benefit of being able to access your data centrally in one place in the cloud. What’s more, Amazon SimpleDB automatically and geo-redundantly replicates your data to ensure high availability. This means that unlike a centralized on-premise solution, you’re not creating a single point of failure with Amazon SimpleDB, and your data will be there when you need it. All of the data can be stored via web services requests with one solution and then accessed by any device.

Cost-efficient: Amazon SimpleDB charges inexpensive prices to store and query your data logs. Since you are paying as you go for only the resources you consume, you don’t need to do your own capacity planning or worry about database load. The service simply responds to request volume as it comes and goes, charging you only for the actual resources consumed.

Online Games

For developers of online games on any platform, Amazon SimpleDB offers a highly-available, scalable, and administration-free database solution for user and game data.

Common data online games can store, index, and query with Amazon SimpleDB includes:

  • User scores and achievements
  • User settings or preferences
  • Information about a player’s items or user-generated content
  • Game session state (when play is saved or interrupted)
  • Dynamic game content (applying a service-oriented architecture to your game and storing and serving new challenges or content for players with Amazon SimpleDB)
  • Indexed metadata for large objects used by your game and stored in Amazon S3

Multiple properties of Amazon SimpleDB make it well-suited to be a data store for online game data:

High Availability automatic geo-redundant replication and failover): Amazon SimpleDB achieves high availability by automatically creating multiple copies of your data and managing failover to an available copy in the event one copy becomes unavailable. This means you avoid the complexity of setting up database clusters, but your game and users still enjoy reliable, interruption-free access to key data.

No-touch scaling: As your user base grows and player activity fluctuates, Amazon SimpleDB simply responds to traffic and request volume as it comes and goes without the need for developer intervention. You only pay for the resources you actually consume.

Zero administrative overhead: Avoid the hassles of database management and eliminate the work of infrastructure provisioning, software setup, creating and maintaining a schema, building indices, or tuning query performance. You can get back to building fun games and features for your users, and stop being the database administrator.

Indexing Amazon S3 Object Metadata

Many developers use Amazon SimpleDB in conjunction with Amazon Simple Storage Service (Amazon S3). Amazon SimpleDB can be used to store pointers to Amazon S3 object locations and detailed information about the objects (metadata), thereby supplementing Amazon S3 with the rich query functionality of a database. For developers storing large numbers of objects in Amazon S3, Amazon SimpleDB offers a flexible, scalable, and inexpensive way to store object metadata while offloading all of the administrative overhead associated with running a database. Common examples of object metadata that can easily be stored, indexed, and queried in Amazon SimpleDB include:

  • Data type or format (image, video, document)
  • User associations or access designations
  • Dates the object was created, accessed, or modified
  • Name or location of related objects
  • User ratings and comments
  • Subject or category tags
  • Geolocation tags

Storing metadata like the examples listed above is valuable for content delivery, media applications, backup and archiving applications, and many other application types. Amazon SimpleDB makes an ideal home for metadata because it provides:

Flexible, schema-less design: Easily append additional metadata attributes without “breaking” a rigid schema. If you want to start tracking user ratings for video objects, it won’t involve time-consuming database changes.

Multi-valued attributes: A metadata attribute can have multiple values. This means photos can be tagged with multiple people or music files with multiple genres.

Zero administrative overhead: In addition to removing the hassles of infrastructure provisioning and software installation and maintenance required to run a database, Amazon SimpleDB automatically indexes your data, tunes query performance, and creates geo-redundant copies of your data.

Amazon SimpleDB also provides low-friction scaling, automatically responding to changes in request volume, and only charging you a cost-effective amount for the resources you actually consume.

API Summary

Amazon SimpleDB provides a small number of simple API calls which implement writing, indexing and querying data. The interface and feature set are intentionally focused on core functionality, providing a basic API for developers to build upon and making the service easy to learn and simple to use.

  • CreateDomain – Create a domain that contains your dataset.
  • DeleteDomain – Delete a domain.
  • ListDomains – List all domains.
  • DomainMetadata –Retrieve information about creation time for the domain, storage information both as counts of item names and attributes, as well as total size in bytes.
  • PutAttributes – Add or update an item and its attributes, or add attribute-value pairs to items that exist already. Items are automatically indexed as they are received.
  • BatchPutAttributes – For greater overall throughput of bulk writes, perform up to 25 PutAttribute operations in a single call.
  • DeleteAttributes – Delete an item, an attribute, or an attribute value.
  • BatchDeleteAttributes – For greater overall throughput of bulk deletes, perform up to 25 DeleteAttributes operations in a single call.
  • GetAttributes – Retrieve an item and all or a subset of its attributes and values.
  • Select – Query the data set in the familiar, “select target from domain_name where query_expression” syntax. Supported value tests are: =, !=, =, like, not like, between, is null, is not null, and every (). Example: select * from mydomain where every(keyword) = ‘Book’. Order results using the SORT operator, and count items that meet the condition(s) specified by the predicate(s) in a query using the Count operator..[3].

Note Amazon SimpleDB has been integrated with AWS Identity and Access Management to enable fine-grained control over Amazon SimpleDB resources. Through integration with AWS Identity and Access Management, an AWS Account signed up to use SimpleDB can create multiple Users. In turn, these Users may be granted SimpleDB API level permissions to access the SimpleDB domains owned by the AWS Account. See the AWS Identity and Access Management detail page for additional details.

Consistency Options

Amazon SimpleDB stores multiple geographically distributed copies of each domain to enable high availability and data durability. A successful write (using PutAttributes, BatchPutAttributes, DeleteAttributes, BatchDeleteAttributes, CreateDomain or DeleteDomain) means that all copies of the domain will durably persist. Amazon SimpleDB supports two read consistency options: eventually consistent reads and consistent reads.

  • Eventually Consistent Reads (Default): the eventual consistency option maximizes your read performance (in terms of low latency and high throughput). However, an eventually consistent read (using Select or GetAttributes) might not reflect the results of a recently completed write (using PutAttributes, BatchPutAttributes, DeleteAttributes, BatchDeleteAttributes). Consistency across all copies of data is usually reached within a second; repeating a read after a short time should return the updated data..[3]
  • Consistent Reads in addition to eventual consistency, Amazon SimpleDB also gives you the flexibility and control to request a consistent read if your application, or an element of your application, requires it. A consistent read (using Select or GetAttributes with ConsistentRead=true) returns a result that reflects all writes that received a successful response prior to the read.

By default, GetAttributes and Select perform an eventually consistent read. Since a consistent read can potentially incur higher latency and lower read throughput it is best to use it only when an application scenario mandates that a read operation absolutely needs to read all writes that received a successful response prior to that read. For all other scenarios the default eventually consistent read will yield the best performance. Note also that Amazon SimpleDB allows you to specify consistency settings for each individual read request, so the same application could have disparate parts following different consistency settings..[3]

Transactions

Amazon SimpleDB is not a relational database and sacrifices complex transactions and relations (i.e., joins) in order to provide unique functionality and performance characteristics. However, Amazon SimpleDB does offer transactional semantics such as:

  • Conditional Puts/Deletes enable you to insert, replace, or delete values for one or more attributes of an item if the existing value of an attribute matches the value you specify. If the value does not match or is not present, the update is rejected. Conditional Puts/Deletes are useful for preventing lost updates when different sources write concurrently to the same item.

Conditional puts and deletes are exposed via the PutAttributes and DeleteAttributes APIs by specifying an optional condition with an expected value. For example, if your application was reserving seats or selling tickets to an event, you might allow a purchase (i.e., write update) only if the specified seat was still available (the optional condition). These semantics can also be used to implement functionality such as counters, inserting an item only if it does not already exist, and optimistic concurrency control (OCC). An application can implement OCC by maintaining a version number (or a timestamp) attribute as part of an item and by performing a conditional put/delete based on the value of this version number. To learn more about transactional semantics or consistency with Amazon SimpleDB, please refer to the Amazon SimpleDB Developer Guide or Consistency Enhancements Whitepaper.[3]

Choosing an AWS Database Solution

Amazon Web Services provides a number of database alternatives for developers. You can run fully managed relational and NoSQL services or you can operate your own database in the cloud on Amazon EC2 and Amazon EBS.

Amazon RDS enables you to run a fully featured relational database while offloading database administration. Amazon DynamoDB is a fully managed NoSQL database service that provides extremely fast and predictable performance with seamless scalability. Amazon SimpleDB provides a non-relational service designed for smaller datasets. Using one of the many AMIs on Amazon EC2 and Amazon EBS gives you full control over your database without the burden of provisioning and installing hardware..[3]

There are important differences between these alternatives that may make one more appropriate for your use case.[3]

Data Storage in Amazon SimpleDB vs. Data Storage in Amazon S3

Unlike Amazon S3, Amazon SimpleDB is not storing raw data. Rather, it takes your data as input and expands it to create multiple indices, thereby enabling you to quickly query that data. Additionally, Amazon S3 and Amazon SimpleDB use different types of physical storage. Amazon S3 uses dense storage drives that are optimized for storing larger objects inexpensively. Amazon SimpleDB stores smaller bits of data and uses less dense drives that are optimized for data access speed.[3]

In order to optimize your costs across AWS services, large objects or files should be stored in Amazon S3, while smaller data elements or file pointers (possibly to Amazon S3 objects) are best saved in Amazon SimpleDB. Because of the close integration between services and the free data transfer within the AWS environment, developers can easily take advantage of both the speed and querying capabilities of Amazon SimpleDB as well as the low cost of storing data in Amazon S3, by integrating both services into their applications.[3]

Amazon SimpleDB currently enables individual domains to grow up to 10 GB each. If your data set is larger than 10 GB, simply take advantage of Amazon SimpleDB’s scale-out architecture and spread your data over multiple domains. Since Amazon SimpleDB is designed with parallelism in mind, spreading your data over more domains will also increase your write and read throughput potential. You are initially allocated a maximum of 250 domains; please complete this form if you require additional domains.

Estimating Your Storage Costs

With Amazon SimpleDB, the best way to predict the size of your structured data storage is as follows:

Raw byte size (GB) of all item IDs + 45 bytes per item + Raw byte size (GB) of all attribute names + 45 bytes per attribute name + Raw byte size (GB) of all attribute-value pairs + 45 bytes per attribute-value pair

To calculate your estimated monthly storage cost for the US East (Northern Virginia) Region or US West (Oregon) Region, take the resulting size in GB and multiply by $0.25. For the EU (Ireland) Region, Asia Pacific (Singapore) Region, Asia Pacific (Sydney) Region, or the US West (Northern California) Region, take the resulting size in GB and multiply by $.275. For the Asia Pacific (Tokyo) Region, take the resulting size in GB and multiply by $0.276. For the South America (Sao Paulo) Region, take the resulting size in GB and multiply by $0.34.[3]

Machine Utilization Example

Amazon SimpleDB measures the machine utilization of each request and charges based on the amount of machine capacity used to complete the particular request (SELECT, GET, PUT, etc.), normalized to the hourly capacity of a circa 2007 1.7 GHz Xeon processor. Machine utilization is driven by the amount of data (# of attributes, length of attributes) processed by each request. A GET operation that retrieves 256 attributes will use more resources than a GET that retrieves only 1 attribute. A multi-predicate SELECT that examines 100,000 attributes will cost more than a single predicate query that examines 250.

In the response message for each request, Amazon SimpleDB returns a field called Box Usage. Box Usage is the measure of machine resources consumed by each request. It does not include bandwidth or storage. Box usage is reported as the portion of a machine hour used to complete a particular request. For the US East (Northern Virginia) Region and US West (Oregon) Region, the cost of an individual request is Box Usage (expressed in hours) * $0.14 per Amazon SimpleDB Machine Hour. The cost of all your requests is the sum of Box Usage (expressed in hours) * $0.14.

For example, if over the course of a month, the sum of the Box Usage for your requests uses the equivalent of one 1.7 GHz Xeon processor for 9 hours, your charge will be:

9 hours * $0.14 per Amazon SimpleDB Machine Hour = $1.26.

If your query domains are located in the EU (Ireland) Region, Asia Pacific (Tokyo), Asia Pacific (Singapore) Region, Asia Pacific (Sydney) Region, or US West (Northern California) Region, Amazon SimpleDB Machine Hours are priced at $.154 per Machine hour. If your query domains are located in the South America (Sao Paulo) Region, Amazon SimpleDB Machine Hours are priced at $0.19 per Machine Hour. All cost calculations should be adjusted to reflect pricing in the relevant region.[3]

Note

Your use of this service is subject to the Amazon Web Services Customer Agreement..[4].


References

Cite error: Invalid <references> tag; parameter "group" is allowed only.

Use <references />, or <references group="..." />
  1. https://awsdocs.s3.amazonaws.com/SDB/latest/sdb-dg.pdf
  2. 2.0 2.1 https://aws.amazon.com/ru/simpledb/
  3. 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 https://aws.amazon.com/ru/simpledb/details/
  4. https://aws.amazon.com/agreement/