A Terabyte is a “Big Round Number”. For while, it was the benchmark of a big data warehouse. However, Moore’s law seems to apply to the cost of hard disks as well. The cost of disks keeps falling. Just for fun, here are some quick benchmarks:
1 Terabyte of inexpensive PC disks: $344
(4 x 250 GB SATA disks http://www.newegg.com/Product/Product.asp?Item=N82E16822148065)
1 Terabyte of iPods: $6,650
(17 x 60 GB iPods http://store.apple.com/1-800-MY-APPLE/WebObjects/AppleStore.woa/wo/1.RSLID?mco=CC4D3CBB&nclm=iPod)
1 Terabyte of RAID 5 SCSI attached to a bare bones Xeon server: $8,148
(Dell Poweredge 2800 4x300GB SCSI 10,000 RPM)
1 Terabye of RAID 5 Direct Attached Storage Server: $16,159
(Dell AX150 iSCSI SAN 3x500 GB)
The problem is that the cost of storage hardware is only a small part of the cost of a Terabyte of data warehouse. A Terabyte has direct costs in server hardware, backup, and supporting software. It has indirect costs in program complexity, mostly in the labor of DBAs, ETL developers, and related program staff.
Storage is cheap. As the cost goes down, substituting storage for development effort will save time and money. For example: storing several aggregates, keeping both normalized and dimensional data, and extending retention periods. The direct costs of storage are unavoidable, but indirect costs scale with complexity.