16th meeting

今天是第十六次开会啦,想想都进行了这么久了。今天主要讲了下面几个事情:

1.给山口老师展示了一些测试数据的结果,大概在n~2情况下,加速比有2-3倍,n~3情况下,能到4.5-5.5倍,所以说还是可以的。

2.现在的程序可以更加灵活地选择继续的方式,1-8块。

继续阅读“16th meeting”

A dynamic synchronization mechanism and its implementation on Cell Broadband Engine.

Title: A dynamic synchronization mechanism and its implementation on Cell Broadband Engine.

题目:一种并行动态调度方法及其在Cell/BE上的实现

Background: Nowadays, multicore on a chip is a common trend for microprocessor architecture. The successful example is a Cell Broadband Engine which is designed as a main engine for the game machine. As for the calculation of computer graphics on the multicore processors, the whole tasks can be divided and allocated statically to each core and the synchronization of each task is done statically, since the execution time are estimated beforehand. The cost performance of the multicore like Cell Broadband Engine is very high; so many researchers are working on how to apply multicore processors to more general application fields. In general, if a big task is divided into many subtasks, the execution time of each subtask cannot be estimated beforehand, so the dynamic task allocation and dynamic synchronization are needed for the efficient execution of parallel tasks. The data-driven principle and scheme is very simple and formal method for this dynamic synchronization mechanism. So, this research aims to find and evaluate the effectiveness of dynamic synchronization mechanism on the Cell Broadband Engine.

背景:现在,单芯片多核处理器已经成为了一种多核结构的趋势。一个成功的案例就是IBM公司的Cell/BE处理器,它是为游戏专用的引擎。因为在多核处理中,计算机图形的计算可以被划分为工作块,并且执行时间可以预知,所以可以静态地分配到不同的计算核中,进行并行计算和同步。因此,人们希望将这种Cell/BE多核处理器的强大的计算功能应用到其他的通用计算领域。但是一般情况下,如果一个任务被分成不同的子任务块,他们各自执行的时间和结果是不能够被预知的,所以需要一种动态的任务调度和同步方案,来解决这个问题。我们的研究目标是为Cell/BE找到一种有效的动态调度和同步方案,并且对他进行评价。

继续阅读“A dynamic synchronization mechanism and its implementation on Cell Broadband Engine.”

14,15th meeting

动态调度基本思想:

一般的并行处理机,都有一个控制单元和若干个执行单元,以我们要做的Cell/BE为例,由一个控制PPE和8个数据处理SPE组成,我们设计的他们之间的工作关系。

PPE工作基本流程见图2:

继续阅读“14,15th meeting”

11th meeting

This week, I realised the waiting mechanism as I wrote last week. And ran on CELL BE machine.

I choosed the 8*8 data and devided it into 4 blocks. Single execute time is about 0.005s and the parallel time is about 0.010s. It means I still have a lot of work to do.

Go through the code I wrote again, I find out a lot of “for” and replacement, which may cause the delay of the execution.

继续阅读“11th meeting”

10th meeting

1. First, analyze the algorithm parts and step, ex: point of interest algorithm consist of five parts, we divide it into 9 steps.

2. Program the spe, each spe can do the whole job, depends on the worknum sent to spe. Ex: the single CPU can do as in x86 model.

3. Define the partition and size of work. Ex: here we part the work into 8, so 8*9 steps need to be worked.

继续阅读“10th meeting”