CPSC 330 Fall 2008 Project 3 Due by midnight the evening of Monday, Dec. 8 This is individual work or for teams of at most two. Write a short (four-page) report on the Sun T1000 (Niagara), including the results of running a test program with variable number of threads. Your paper should present a rationale for a computer design with multiple cores and multiple threads per core. You should also discuss what types of workloads will be best served by this type of design. Run the array program below on niagara.cs.clemson.edu and see if you get a similar type of throughput increase as graphed in Gove's slides on page 15 of the pdf. (For best results, make sure you are the only user on the system when you run your benchmarks - if anyone else is on the system, you can wait and try again after a few minutes.) Also try running the program with large numbers of threads. What explains the decrease in performance? Running other reasonable benchmarks on niagara that demonstrate the impact of threading and explaining the results will be a basis for extra credit. On-line resources: Sun Fire T1000 Server http://www.sun.com/servers/coolthreads/t1000/ Sun T1000/T2000 Architecture White Paper http://www.sun.com/servers/coolthreads/t1000-2000-architecture-wp.pdf Improving Application Efficiency Through Chip Multi-Threading http://developers.sun.com/solaris/articles/chip_multi_thread.html Darryl Gove, Coding for Multiple Threads on a CMT System http://www.cs.clemson.edu/~mark/330/sun_gove_slides.pdf Be sure to include any sources you use in a bibliography. Include the URL for any graphic that you get from a paper or the web and use in your paper; put the URL in a caption underneath the graphic. /* example multithreaded program using Solaris threads (see "man threads") * * compile this using "gcc -O2 -lthread" * then run as "./a.out 1", "./a.out 10", etc. */ #include #include #include #include #define N 100000000 #define T 10000 int a[N]; int partition_length; int partial_sum_results[T]; void *thread_code(void *thread_id){ int i,tid,local_sum; tid = *((int *) thread_id); local_sum = 0; for( i = tid*partition_length; i < (tid+1)* partition_length; i++ ){ local_sum = local_sum + a[i]; } partial_sum_results[tid] = local_sum; } int main( int argc, char * argv[] ){ thread_t t[T]; int tid[T]; int i,n; int final_sum; hrtime_t t_start,t_end; if( argc < 2 ){ printf("usage: where 1 <= n <= %d\n",T); exit(0); } n = atoi( argv[1] ); if( ( n < 1 ) || ( n > T ) ){ printf("usage: where 1 <= n <= %d\n",T); exit(0); } for( i = 0; i < N; i++ ){ a[i] = 1; } partition_length = N/n; if( ( N - (n*partition_length) ) != 0 ){ printf("number of threads doesn't divide evenly\n"); }else{ printf("number of threads is %d, partition length is %d\n", n,partition_length); } t_start = gethrtime(); for( i = 0; i < n; i++ ){ tid[i] = i; thr_create(NULL, 0, thread_code, (void *) &tid[i], (long) 0, &t[i]); } while(thr_join(NULL,NULL,NULL)==0); final_sum = 0; for( i = 0; i < n; i++ ){ final_sum = final_sum + partial_sum_results[i]; } t_end = gethrtime(); if( final_sum != N ) printf("*** error in sum\n"); printf("program with %d thread runs %12.5f secs\n",n, (t_end-t_start)/1000000000.0); }