RAID for Rookies Page: 1

RAIDing for Rookies


As anyone with an interest in enthusiast PCs and hardware will know, acronyms are thrown around hardware circles with more hapless abandon than the viral "... I took an arrow to the knee" Skyrim quote.  The poignant acronym for this article will be... RAID. 

Before we begin, let me just say that you shouldn't be afraid of RAID because it is deemed 'enthusiast' - modern computer systems allow the most complex RAIDs to be setup (far too) easily.  Instead, you should be afraid of how addictive the performance and utility of a solid RAID setup can be!

Historically, RAID was shorthand for 'redundant array of inexpensive disks', however, as our bank managers can attest to, storage media is far from inexpensive.  If you were feeling particularly anal at the time, you could have probably attempted to sue a manufacturer for price misrepresentation! 

On this basis, the term was revised to represent a 'redundant array of independent disks' - cleverly emphasising the combination of multiple, separate, hard-disks without placing an expectation for cost-saving on the part of the consumer.


RAID Overview

So, right about now you'll be wondering what a RAID actually is - physically and logically.  Allow me to elaborate upon these. 

Physically, as the revised acronym allows you to discern, it is many hard-disks operating, logically, in unison as a single storage device.  These disks have commonly been assumed to be identical (i.e. same manufacturer and model) in order to firstly create the array, and for the array to main operable integrity.   This is where raw evidence actually edges towards the 'disproven' camp. 

There are two terms that are important at this point - heterogeneous and homogenous.  The former, heterogeneous, refers to a RAID setup which would be created from multiple different hard-disk drives.  The latter, homogenous, is a RAID setup that would consist of near-identical hard-disk drives.

Several years ago, during the construction of a new-build backup server, I tested the differences between the traditional 'same-disk' RAID and an adhoc RAID, composed of differing vendor hard-disks.  The results were very close - marginal read/write improvements were gained from the same-disk RAID setup, but the difference was slight -  in the order of 25MB/sec average read/write over the adhoc RAID. 

The importance of this factor is impossible to quantify for an enthusiast.  The spare drives you may have access to would inherit cost-saving that could, depending on your goals, far outweigh the marginal 25MB per second read/write decrease you would notice when compared with a pricey new  homogenous RAID setup. 


RAIDing for Rookies"I used to be a rookie until I took an arrow in the... disk-drive" (sorry)


RAID for Rookies Page: 2

RAIDing for Rookies

Before World of Warcraft fans get excited, this isn't about leading a group of Leeroy Jenkins's.  Instead, without further ado, we will cover the most fundamental levels of RAID schemes - RAID 0, RAID 1, RAID1+0, RAID 5 and RAID 6.

RAID 0, otherwise known as a Striping Array, is easily remembered as an all-or-nothing RAID, built purely for performance.  At its simplest definition, it splits up data into 'blocks' - these are distributed between the hard-disk drives in the array as equally as possible.  This allows larger data files to be processed faster as the individual hard-disk drives operated read/write in unison. Ultimately, this allows the combining of their respective blocks to form the complete data file far quicker than the file could be sought from a standard single hard-disk drive. 

On the flip-side, RAID 0 offers no redundancy.  This means that if one drive errors, loses integrity or fails, the entire RAID setup is virtually impossible to salvage - hence, the data stored on the RAID 0 setup will be forfeit.  Therefore, it could be possible to liken a RAID 0 setup to a fast single hard-disk drive; you have performance but no data parity.

RAID 1, is the antithesis of RAID 0.  It is typically consisting of two hard-disk drives which operate in unison, with the key difference being that they are exact sector-by-sector mirror of each other.  This offers no performance advantage, but in the event of a hard-disk drive failure, you can run off just the one remaining hard-disk drive.  They typically operate by a read request being fulfilled by one drive; yet write requests being fulfilled by both.  Wear-levelling has been known to be employed to share the read requests between drives to lengthen the lifespan of the RAID setup before failure.

RAID1+0 (or otherwise known as RAID10) combines the elements of RAID0 and RAID1.  That is, it consists of a striped RAID 0 setup which is made of a mirrored RAID (1).  This is sometimes deemed a suitable RAID setup for a compromise between speed and redundancy of data, however, in the event of a drive failure, the striped element of the RAID will not be able to operate at original speed – thereby taking a noticeable performance hit over the full array’ speed.  The mirrored element of this array allows a failed drive to be replaced and the array restored to original operational integrity, thereby allowing the array to be rebuilt with minimal difficulty.

Around-about this point, you may think that all possible beneficial setups for RAID have been established; you'd be incorrect however... 
RAID 5 is similar to RAID1+0 in that it has both striping (for performance) and redundancy, however, it differs in the method of its redundancy.  Instead of traditional mirroring of stripes, it will distribute the stripes across all bar-one of the hard-disk drives in the array - the final drive will be used to store the data parity - essentially a copy of the data in its entirety.  This might not sound too amazing if you are thinking that the data parity is retained on just one disk drive in the array, however, the RAID5 setup distributes the parity logically, and equally, across all drives, best displayed in the following diagram:

RAIDing for Rookies
RAID5 - simplified.  Maybe...  


As if RAID5 couldn't be improved upon further, an additional array, RAID6, was created. 

RAID6 caters for an additional parity (essentially a mirrored parity) being incorporated into the array.  This allowed for up to two hard-disk drives in the array to fail before the array would become 'at risk'; thereby offering the greatest array stability and recovery whilst still offering performance through striping.


RAID for Rookies Page: 3

RAIDing for Rookies

Summary of Terms

In this brief written introduction to RAID setups, you should now have an awareness of several variations of RAID (unless you did already!), and more specifically their general areas of expertise.  To summarise several key terms:

Performance; this is gained by striping data across multiple drives - thereby allowing data to be processed in synchrony without delays in read/write/seek times.  This attribute offers no safety-net, so if an error occurs on ANY hard-disk drive, it may likely damage the entire array.

Mirroring; this copies data across multiple drives - should one drive, or data element be corrupted or damaged, it will allow the same data to be read from another drive in lieu of the original.   This attribute offers no performance gains.

Parity; this is allows for each data file to have a structured, complete copy placed within the array - should a drive fail, each data file will be able to be fully recovered from either the data parity, or the data stripes.

Homogenous; in RAID-terms, is defined as consisting of near-identical hard-disk drives.  Fully identical drives are difficult to achieve based on variation in disk structure, spin-speed and other physical characteristics; however, same brand, same quoted storage media size and same model are typically sufficient to fit our definition of homogeneous.

Heterogeneous; specific to RAID, this simply means 'non-identical'.  This could be as simple as different manufacturers, different models, or even, in some setups (not recommended, and difficult to construct a RAID with) different capacity hard-disk drives.


Beginner RAID FAQs

Q:  Can I use Solid-state drives for a super-fast array?

A:  Yes and no...  SATA3 and PCI-E SSDs are already ridiculously fast - RAIDing them will either present negligible performance improvement for software RAID0, based on the lack of TRIM support in RAID; or in the case of the mirroring and parity RAIDs, will greatly increase the wear rate of the SSDs, therefore reducing their lifespan.  So it *is* possible, but it does not make financial sense in my honest opinion!

Edit: A brief side note SSD in a RAID0 setup, supporting hardware-level garbage collection will avoid negative issues surrounding performance degradation - but once again, speed and cost are two factors that most of us must balance carefully!


Q:  Software RAID or Hardware RAID - which is best?

A:  Neither outweighs the other in a direct comparison, however, they both have their contrasting strengths and weaknesses.  Software RAIDs excel at being forensically easier to recover (in the event of an error), whilst hardware RAIDs can be notoriously troublesome to restore.   Cost is important, and this is where software RAID, once again beats hardware RAID, in being a far cheaper setup.  
Software RAIDs are ideal for high performance enthusiast workstations or home servers, whilst hardware RAIDs offer features (such as hot-swapping and performance increases) which are far more beneficial to enterprise level server farms and clusters.
On the subject of performance increases, the larger array setups of RAID5 and RAID6 - consisting of 4 or more hard-disk drives - require some hefty processing overheads which can substantially dominate I/O data in a software/onboard RAID controlled system.  The benefits of a separate hardware RAID controller are that it frees up the core system processing for other tasks and maintains the RAID without involvement from any other hardware or software.  This often explains end-user complaints of RAID5/6 setups being particularly underwhelming, and sadly, always ends up costing more money at the end of the day as they are forced to abandon their RAID5/6 setup, or walk the path of the hardware RAID.


Q: So, I have 2 hard-disk drives, a 2TB and a 500GB - can I RAID them?

A: This depends entirely on the controller.  Some will allow different drives (i.e. heterogeneous) to be used together, however they will place a limiter on the larger drive.  This basically reduces the larger drive to only use the capacity of the smaller drive.  So in short, you are wasting 1.5TB of capacity, for the benefit of a RAID setup.   Not the best use of space...

Q: Is it true RAID can only run with SATA connected drives?

A: Absolutely not - you can have IDE, SATA, Serial and SAS drives deployed in RAID setups without issue.  The only issues will be bandwidth, noise and heat.  For instance, SAS are renowned for being high-speed drives (15k spin speed typically), which generates a lot of heat and noise; additionally, peak power (during start-up) can sometimes cause issues on older system PSUs unable to supply the required voltages.

Coming soon - RAID expanded; how to deploy your own RAID setups; more jargon and an unhealthy amount of fundamental hard-disk drive inner-workings to satiate the inner-techie in all of us!


Author's Note

This article is intended as an appetite-whetting device for enthusiasts who are new to RAID, or those who simply want to refresh their knowledge.  Myself and colleagues at OC3D all know that RAID setups can become a whole-lot more complex than this article, so hang-fire, and we will boggle you in future articles! Discuss your thoughts in the OC3D Forums.