Friday, July 13, 2007

Architectural Differences in Linux

In this second edition in the Evaluating Linux series of posts I want to discuss what is both one of the strengths and weaknesses of Linux, namely the architectural differences between it and the traditional UNIX platforms. The relevant architectural differences between Linux and UNIX (AIX, HP-UX, Solaris: take your pick) can be viewed as several broad categories:
  • hardware differences
  • filesystem selection
  • flexibility
  • scalability
Hardware
Many of the core attributes of Linux come from the hardware that it runs on, in most cases Intel compatible systems. Typically whether these use actual Intel processors, or the extremely competitive AMD processors, these are referred to as x86 or x86-64 systems.

In the commodity server market most systems have a limit of 4 CPU chips per server, which is based upon limitations with the chips and motherboards. To address this AMD and Intel are producing chips that allow for two or four cores per chip, with eight and more on their near term roadmaps. Each core is itself a full CPU, and combining multiples of them onto a single chip allows the sharing of things like cache memory, helping to reduce the demands placed on the comparatively slow RAM and I/O bus.

Beyond physical processing limitations, most servers have a single I/O bus, limiting very I/O intensive applications. This bus design is more than adequate to support your average database or application workloads, but where it can fall short is in processing applications that require truly large amounts of data movement, like datawharehousing or imaging systems. Both of these limitations are areas where the traditional UNIX hardware shine. Because of their use of specialized processors, busses, and I/O chips they are able to scale into the dozens of CPUs per server, or into the gigabytes per second of I/O load. This also helps to put the price differences between commodity x86 servers and proprietary UNIX servers into perspective.

Filesystems
One of the aspects of open source software is that many people try to improve on the status quo, and one of it's strengths is that they often succeed. This is evident in the selection of filesystems available for Oracle databases running on Linux. Almost all relevant Linux distributions ship with the EXT3 filesystem as the default, and it's not uncommon to see it housing Oracle binaries and data files. This is a non-clustered filesystem suitable for general purpose use.

In a clustered environment, such as an architecture built around Oracle RAC, EXT3 cannot be used for the database because of it's lack of cluster support. Instead Oracle offers two choices of their own: OCFS2 and ASM. OCFS2 is designed as a clustered filesystem which allows data files to be accessed simultaneously by multiple servers. As an alternative to OCFS there is ASM, which uses what amounts to raw disk partitions to house the data blocks, and a specially designed Oracle instance to manage them. ASM has the advantage over OCFS2 in that it is supported on many platforms beyond Linux, and also because it offers advanced features like optimizing data block placement and data protection.

Another aspect of filesystems on Linux, and a partial explanation as to why there are choices to be made, is that of I/O performance. This article in the Red Hat Magazine is a good source of information on the topic, and provides some tips for performance improvement. This appears in the Oracle environment as topics such as asynchronous I/O, and I'll highly recommend looking into these issues MetaLink to see how to get the best performance from a Linux database server.

Flexibility
Linux is already equipped with many tools that make it ideal to services like web servers, applications servers, and file servers. In contrast, these tools have to be added to traditional UNIX systems, which can be a difficult process for even a veteran sysadmin. In fact many opensource tools are developed directly on Linux and then ported to other versions of UNIX.

Additionally the potential for a lower hardware costs makes it possible to implement servers that are dedicated to particular functions such as administrative tools, which typically could be cost prohibitive in a UNIX environment.

Scalability
Hardware design limits, and advances in technology like the distributed programming models, are causing vendors to write applications that can scale outwards onto multiple servers instead of upwards onto a larger one. The days of a large mega-server that sits at the heart of an enterprise application are gone, and have been replaced with a collection of servers each running some component of the application.

This has the side effect of requiring solid management tools to perform tasks like monitoring and maintaining those distributed servers. Both Oracle and Red Hat offer some assistance on the management side by providing tools which can help with patch management.

Evolution
The final major difference between the newcomer Linux and the entrenched UNIX products is one of simple evolution. Because of it's open design and the number of people contributing to it, Linux is evolving at a pace that no traditional vendor can really match. This double edged sword both helps by bringing new features into the OS at a faster pace, but sometimes it cuts the other way by forcing upgrades. The current transition from 32-bit to 64-bit systems is a prime example. In this case the hardware and operating system components were very simple, but the unknowns, and thus the pain, came from running vendor applications on 64-bit platforms, where the application was only partially supported, or needed various bug fixes to make it work.

Conclusion
Linux requires balance between complexity and support costs, since each additional server that it brings to the architecture has purchase, maintenance, and administration costs. It might also push you into decisions that you might otherwise avoid, like the choice of learning the new ASM components. On the other hand it's flexibility might also make it a better choice for a web tier or tools server, make it a logical part of a heterogeneous environment.

No comments: