Web Magazine for Information Professionals

Distributed Computing: The Seti@home Project

Eddie Young and Pete Cliff look at a particularly successful example of a distributed solution to a very large number crunching problem.

Distributed Solutions to Number Crunching Problems

There are many problems that require the large scale number crunching capablilities of supercomputers. For instance calculating Pi to the nth level of precision, attempting to crack the latest encryption algorithm [1], mapping the human genome, or analysing radio waves from space. For some applications, a Supercomputer might not be enough; nor can every project can afford one. However, but with a little bit of clever software engineering applied to the Internet, there is a solution.

Even if you spent a lot of your time active at work and your PC is constantly trying to keep up, there will be moments when you stop for coffee, pop out for a sandwich, or whatever where your computer does nothing. [2]

Multiply all this processor idle time by the number of computers in use today and you have a lot of processing power simply going to waste. One facet of "distributed computing" is the harnessing of this unused processor time to do useful work. By utilizing this idle time projects can clock up years of processing time in days.

The growth of the power of these types of applications can be directly related to the growth of the Internet.

SETI@home

Radio telescopes have been monitoring the skies for decades, providing astronomers with a more detailed picture of the Universe than optical telescopes can provide. Many of the radio signals received are white noise (technically its brown noise, but sounds similar), the sound of space, emissions from stars, nebula, and other stellar bodies. To a radio receiver from outside our solar system the earth itself is sending out a great many radio waves. Like the noise our planet is emitting, perhaps somewhere there will be radio noise from other planets. Noise that could indicate intelligent life.

But how do we find these patterns? The vast number of radio waves that break on our planet every day would take centuries for a single computer to analyse. (Indeed by the time it completed its work we would probably have built spaceships to find out ourselves!) This is where the power of distributed computing can help.

For the last 5 or 6 years Berkeley’s Space Sciences Laboratory has been running the SETI@home project. SETI stands for the Search for Extra Terrestrial Intelligence and the project is dedicated to searching for patterns that may be signs of intelligent life amongst the mostly random mass of radio signals that reach the Earth from space. Each member of the project offers some of their computer’s time to the cause. Membership is open to everyone with access to a computer and the internet. There are currently 2,822,404 members and they have clocked up 582,977 years of processing time.

When you join project SETI you supply a username in the form of your e-mail and some details about yourself and then download the software. The project automatically assigns to you a small portion of the night sky and you begin the search right away. Each time your PC is idle the software starts up as a screen saver and gets to work on the data. Once all of the downloaded data has been anaylsed the results are uploaded to the SETI main server and the next portion of data is downloaded ready for a new analysis. Each data packet is around 300K and uploading and downloading are fast.

You also have the option of running the software all the time "in the background". I have used a 450Mhz PIII for above-average CPU intensive applications (manipulating large images) with the SETI software running in the background without any noticable loss of speed. The software is exceptionally well behaved, running at a low priority and giving up the processor to anything else that requires it. It has never crashed the machine (that is left up to Internet Explorer!).

Distributed Solutions

The idle time of a desktop PC can be anywhere between a couple of minutes to an entire day. Add up this idle time for every PC in the world and it comes to a considerable amount. Distributed computing uses the internet to harness the power of a global network of PCs effectively creating a single supercomputer. It is interesting to think that this supercomputer is not powerful in its own right but rather its power is directly related to the number of active participants. (‘Active’ in the loosest sense of the word since all you need to do is download the software and agree to take part).

The "computer" has a processing power that is a function of the amount of ‘virtual’ time it has available to it to work on a problem. It is not important how fast any single processor is - 50 computers could be working on a different part of the same problem all at the same time, solving it 50 times faster than a single machine.

Effectively this means that the designers of distributed software like this have created a powerful parallel computer, where individual processors can be added easilly, each becoming a cell of the whole. Is it far-fetched to draw an analogy between these individual wired PCs connected through the Internet and individual neurons in a human brain? How far away are we from writing distributed parallel code that can behave in a similar fashion to neurons, creating an unknown type of intelligence that simply uses idle PC time to communicate with other neurons to contemplate other problems? Invisible and silent. Some distributed projects are trying to simulate a very simple model of evolution, wherein a basic predator/prey ‘lifeform’ is designed complete with mutations, and are left on the battlefield to kill, eat, be eaten, and reproduce.

Distributed computing can also bring like-minded people together in a virtual community, where they work on the same project and can exchange ideas and solutions. SETI themselves have developed workgroups to encourage users, with competitions and chat forums. The University of Kent Space Society have active usergroups, and meet regularly in the pub. There is an interesting site at http://www.pcfseti.co.uk/ where many users are actively studying the SETI ideas.

Individually the results may not make much sense, but collectively large complex problems can be solved.

At the time of writing the most powerful computer, IBM’s ASCI White, is rated at 12 TeraFLOPS and costs $110 million. SETI@home currently gets about 15 TeraFLOPS and has cost $500K so far. [3]

As an undergraduate project Pi has now been calculated to a Quadrillion (1 with 18 zeroes) bits during a 2 year project. During those 2 years the combined efforts of two thousand computers worldwide, clocked up an astonishing 600 years of processing time. That’s the equivalent of a single P90 desktop PC working constantly on that single problem for 600 years.

These are PCs that would have otherwise been doing nothing at all.

The data

The data used comes from signals collected from a radio telescope set in the crater of a dormant volcano in Arecibo, in Mexico.

The region in purple on this sky map shows the areas covered so far.

There is still an awful lot of the sky to scan. Since I have had the screen saver running on my PC, installed on 23 February 2000, I have clocked up 3554 hours of CPU time – that’s about 150 days, and my computer is currently analysing data from the region at 18hr 8 min 42 sec, +17 deg, 46’’ 12’ , recorded on Wednesday December 20 2000 at 17:35.17. That is the region in the constellation Ophiuchus on the map above (marked ‘Oph’), collected last December.

The Analysis

When the SETI software runs this is what you see on your screen.

That’s not a very exciting screen saver, but it does move! The graphs are actually power spectra of the regions being analysed, and as the analysis runs they update automatically.

This screen saver is actually running Fourier transformations on the data. This is a technique that seperates out the component frequencies from a signal to provide some information on its structure. Particularly the software is looking for spikes and Gaussian bells.

Spikes: - A spike is a frequency with a significantly high amplitude. These are measured to ensure that the software and hardware are working properly, but could also provide interesting patterns.

Gaussians bells: - The anaysis supposes that a distant (alien) transmitter is sending out some sort of signal. Signals from this transmitter should get stronger and weaker, as the recording telescope moves over that point in the sky. This is measured as a power increase and then decrease, which has a very characteristic shape, known in mathematics as a Gaussian bell, since it looks roughly bell shaped. (The top of the bell the strongest signal, fading out to the edges.)

SETI use a statistical method (known as Chi squared) – which returns a high or low number, depending upon how close the data is to matching a Gaussian bell shape. The lower the value the closer the data matches a Gaussian bell, increasing the importance of that location of the sky.

For example if a similar project sere running on distributed computers on a planet orbiting Alpha Centuri it may not take long before they see a Gaussian bell shape centred over our solar system.

At the time of writing nothing has been found, but they (and we) are still looking, and still hopeful.

note: 1
Encryption companies often place large scale challenges on the Web to see just how secure their encryption methods are. Some like-minded people completed one of these challenges by using the distributed use of processor time as outlined in the article. It would be of no surprise to discover similar, illegal projects, working to decrypt code in this way. Interestingly a virus could potentially function in much the same way as the SETI@home screensaver, only keep itself hidden.
note : 2
This article was written on a Pentium III 450Mhz PC, which is pretty slow compared to desktop PCs available today, yet as I type my processer never goes above 5% usage. When I fire up Excel and open my timesheet for editing there are peeks of about 30% and even the Flash intensive AtomFilms site only manages a peek of 100% before settling back to around 4%. A streamed video from the same site idles at 5%. In short, in everyday usage, my machine’s processor is heavily underused. (Try this yourself and see how much CPU processing power you use during the day).
note: 3
You can consider a ‘FLOP’ to be a single calculation - FLOP - FLOating Point calculation. In this case FLOPS stands for floating point calculation per second. So 1 TerraFLOPs is over a trillion floating point calculations a second.

References

  1. http://setiathome.ssl.berkeley.edu/ The SETI home pages
  2. http://www.cecm.sfu.ca/projects/pihex/pihex.html calculating pi
  3. http://www.nyx.net/~kpearson/distrib.html website on distributed computing
  4. http://www.naic.edu/ The Arecibo homepage
  5. http://golem03.cs-i.brandeis.edu/
  6. A technical article on SETI@home has been published in the IEEE magazine "Computing in Science and Engineering". It is available at the IEEE Computer Society’s web page (http://www.computer.org).

Author Details

Eddie Young
Network Support Officer
UKOLN
e.young@ukoln.ac.uk

Pete Cliff
Research Officer
UKOLN
p.d.cliff@ukoln.ac.uk