Friday, March 29, 2019

Novel Clockwise Task Migration in Many-Core Chip

Novel dextral Task Migration in Many- total ChipA Novel Clockwise Task Migration in Many-Core Chip MultiprocessorsAbstract-The indus movement prune for Chip Multiprocessors (CMPs) moves from multi- sum of m stary to more- karyon to obtain naughty work out accomplishment, flexibility, and scalability systems. Moreover, the transistors size is constantly shrinking, and to a greater extent and more transistors ar interconnected in a single interrupt that allows to design more reigning and complicated systems. However, obtaining higher(prenominal) computing surgery needs to attach the consuming of force play exp wind uping which results in change magnitude the on- trash impetuousspots and the overall chip off temperature. The peak temperature causes surgical procedure abasement, reducing reliability, decreasing the chip feel spam, and eventually, damaging the system. Therefore, Run meter Thermal Management (RTM) for CMPs has become authoritative to lessen temperatu re without any surgical operation degradation. In this paper, a new dextrorotatory lying-in migration proficiency is proposed on numerous an(prenominal)- nerve center CMPs. The proposed technique migrates the labored soused line of works which be fit(p) in a of import hearts away from the aboriginal contents to the skirt joins. The proposed technique performs a right-handed task migrations to distribute the variations spicyspots that be placed in the rally sum total of the chip. Moreover, the proposed migration algorithmic rule gathers centers temperature by apply deed-counters and proposed equations which visualizes rough-and-ready results so cardinalr of use thermic sensors. Simulation results indicate up to 15% decline in the maximum temperature value of the whole many- warmheartedness CMPs. The efficiency of the proposed technique is shown by temperature values of many- eye CMPs that atomic number 18 below the maximum temperature limit.Keyword s- chip multiprocessors many-core task migration performance counter run measure thermal management.The chip multiprocessors (CMPs) is continued to increase the number of transistors to organization the increased demand of the maintaining reliability and high computing performance. In the corresponding time, transistors size atomic number 18 constantly shrinking, and more and more transistors are co-ordinated in a single chip that allows to design more advocateful and complicated CMPs computer computer architectures 1. These advantages lead to increase cores number on the CMPs, therefore CMPs are shifting from multicore to many-core era where tens or hundreds of cores are integrated on a single chip connected via ne cardinalrk-on-chip (NoC) 4-5. In fact, many-core CMPs fork up higher computing performance because of executing heavy charge uped tasks which consume more antecedent outlay. However, heavy loaded tasks lead to increase the overall chip temperature and on-chip hotspots. Hotspots are the main driving obstacle for wide adoption of many core CMPs architectures which lead to performance degradation, less block off reliability, increased cooling costs, shorter chip life span, and eventually the system frailer. Therefore, to achieve better computing performance with higher scalability and maintaining reliability, efficient Runtime Thermal Management (RTM) techniques become very imperative 3,6-8.In fact, RTM non only aims to ease and distribute the temperature of the chip but also enables many-core CMPs to operate at a favorable performance duration on the job(p) below a temperature threshold 1-2. Therefore, in order to maintain efficient performance on the many core CMPs, authors propose a clockwise task migration technique that is served as an alternative to control the many core CMPs cores temperature. The proposed migration technique migrates the heavy loaded tasks which are placed in the interchange cores away from the central vocal isation to the contact element on the core layer. In other word, the proposed method performs the clockwise task migrations to distribute the variations hotspots that are placed in the central cores of the chip. The proposed method aims to maximize the throughput on many core CMPs while satisfying the peak temperature constraint 5-6,9.With the development of many-core CMPs, using high overhead expensive thermal sensors to measure cores temperature becomes not efficient nor improper to encounter thermal challenges 3,12. Therefore, in this work, a new technique name been provided to measure cores temperature instead of using thermal sensors. The proposed migration algorithm obtains the core temperature by using performance-counters which are placed in individually core. In this context, cores with high temperature are distributed on the chip without any performance degradation 1-3,11-13. In this paper, they are some contributions are achieved as followingIt develops a novel runti me task migration technique in many-core systems to balance hotspots.Instead of using high overheads expensive sensors to majeure cores temperature, the proposed task migration technique is using performance-counters.Experimental results show that the proposed algorithm can signicantly outperform the conventional approach.The rest of the paper is organized as follows. First of all in trip II, a synopsis of related works is given. The proposed technique is introduced in Section III. In Section IV, experimental evaluation is presented. Finally, the conclusion is given in Section V. succession the industry slips of CMPs is to increase transistors numbers redundant exponentially as Ohms low, its ease to achieve more powerful and better computing performance by executing heavy loaded tasks 1-3. However, heavy loaded tasks lead to increase on-chip thermal hotspots and the overall CMPs peak temperature. Thus, in case of having hundreds of processors are integrated on a single chip as many-core CMPs, off-line methods are not efficient. Therefore, RTM becomes crucial to balance on-chip thermal hot-spots and the overall CMPs peak temperature 1-3,8-10. To this end, many theoretical works have been carried out to dissipation and elimination thermal hot-spots by different techniques. For instance, Dynamic Voltage and Frequency Scaling (DVFS) technique in 7 aims to control the temperature by dynamically adjusting the processor speed base on the workload. However, DVFS techniques dynamically adjusting the processor speed based on the workload which sacrice the performance to cool down the chip temperature. Another technique called task migration technique which aims to manage the on-chip temperature by balancing the tasks loads among CMPs tiles without lag down the processing. In 1-3,8,10-11 the proposed algorithms in some cases is unable to find a proper destination core due to the thermal constraints, therefore, authors have utilize DVFS which had proved to be ine fficient as far as performance is concerned. In 2, authors had implemented many thermal-aware algorithms to migrate tasks between processor cores to reduce thermal variation in 3D architecture with stacked drachm memory. However, the authors are used some techniques that proceed static task migration which in some cases can migrate a task from cold core to a hotspot core. Also, the authors proposed another techniques which are providing high overheads expensive thermal sensors to detect the on-chip hotspot. Moreover, in 2-3, authors proposed other techniques which always assigns the new job to the coolest core for balancing the thermal hotspots across the chip, however it increases hotspots in the system rapidly. Therefore, in case of having hundreds of processors are integrated on a single chip as many-core CMPs, off-line methods are not efficient to distribute and balance the thermal hotspots. In this work, a novel runtime task migration technique is proposed which offers an mil itary issueive solution to face thermal challenges in many-core CMPs. Furthermore, instead of using high overhead expensive sensors to measure cores temperature, the proposed migration technique is using performance-counters to measure many-core CMPs tiles temperature.Fig. 1 Many-core CMPs with 64 cores and the TCU connection with a tile on many core CMPs.Fig. 2 A tile components in 64 cores many-core CMPs.Nowadays, the CMPs industry trend moves from multi-core to many-core architectures to achieve better computing performance, and more maintaining reliability. Therefore, many-core CMPs architectures provide heavy loaded tasks to allow the system operating at high computing performance. However, heavy tasks lead to increase peak temperature of chip and on-chip hotspots. Thus, RTM is crucial to achieve equilibrise systems temperature threshold with efficient task execution performance.As shown in common fig 1, a many-core CMPs with 64 tiles is presented. Each tile includes a core, a underground L1 cache bank, and a shared cache L2 bank as shown in presage 2. The proposed technique in this work aims to balance thermal distribution to combat thermal issues and temperature related reliability. The proposed technique provides task migration between cores while it is done at runtime and repeated periodically at a predefined time interval. Each time interval in this work is 100ms. Each core considers instruction per cycle (IPC) for calculating power outgo at the end of separately interval. IPC is a critical factor in power aspiration calculation. It is no bow that, cores with higher power wasting disease lead to execute tasks with higher performance which create higher temperature in compared with the cores with lower power consumption 8. The power consumption for separately core is calculated based on equating 1.Where P is the core power consumption, IPC is the instruction per cycle which is the core activity, f is the core frequency, CL is the average cap acitance, and VDD is supply voltage. Since the frequency of for each one core in the many-core CMPs is constant and the DVFS technique is expensive and inappropriate because of performance degradation, dynamically change in the frequency of each core is not presume in the system. As can be seen in Equation 1, the IPC has a key role for calculating and predicting the power consumption of each core in system. For calculating IPC, performance counters are used which are very applicable in the modern processors. Each core has a performance counter for IPC counting. At the end of each time interval, IPC is achieved by the performance counter for each core and then power consumption is calculated based on Equation 1. According to the calculated power consumption, a find out up display panel in the Thermal comprise Unit (TCU) will be filled. An example of look up table is illustrated in externalize 3. In the target many core system, the TCU is assumed to be placed near to all of the cores as shown in intention 1. found on the filled table in the TCU, we divide the many core floor plan into two parts, the central part with one domain, and the adjoin part with four regions as shown in Figure 4. Based on the thermal distribution of central part and ring part, we try to balance the temperature in the system. As before mentioned, the look up table is illustrated in Figure 3, based on each core activity, hot and cold cores are determined based on the related thresholds shown in Figure 5 ,where th1=5, th2=10, th3=15, and th4=20.Fig. 3 A sample of a look up table in the PCU used at the end of each time interval.Fig. 4 The central part and the surrounding part of 64 tile of many core CMPs.Based on the plan of hot and cold cores, the proposed technique sorts the cores both in the central part and surrounding part from the hottest to coldest cores. Then the proposed technique exchanges the hottest core in the central part with the coldest core in the surrounding part . Based on this trend, the heavy load tasks are migrated to the edges of the chip and light load tasks are migrated to the central part. It is historied that the edges of the chip is a better choice for placement of the hot cores in compared with the central part because neighbor cores have a big effect on each temperature. Since the number of cores in the surrounding part is trio times of the central part, the hot cores in the central part have more options for migration with a cold core. At the end of each time interval, each core sends IPC information (cores activity) which calculated based on performance counter to the TCU. Then, the TCU based on cores activities from the look up table calculates two sets of activities which are in central part and surrounding part. Therefore, the TCU sorts the activities related to central part and surrounding part from the hottest to the coldest cores, separately. In this part, as shown in Figure 1, TCU exchanges the hottest core in the cent ral part with the coldest core in surrounding part region by region as will be explained in the next subsection. It is notable that the TCU can migrate the hot cores in the central part with the cold cores in the surrounding part in the clockwise manner.Fig.5 The used thresholds for determining the ranges of temperature of the cores.Fig. 6 The proposed clockwise task migration algorithm.A. Clockwise Migration AlgorithmFor avoiding the gathering of all of the hot cores in a one region of surrounding part instead of divide it the whole surrounding part regions, a novel clockwise algorithm is proposed. This clockwise migration algorithm divides the surrounding part into four regions as shown in Figure 4. after sorting the cores from high temperature to low temperature both in of central part and surrounding part by the TCU, the proposed clockwise algorithm exchanges the hottest core in the central part with a coldest core in the surrounding part region one. After that, the proposed clo ckwise algorithm exchanges the hottest core in the central part with a coldest core in the surrounding part region two etc. The system repeats this procedure periodically at the end of each time interval to migrate the hot cores in the central part with the cold cores on four regions in surrounding part. The epitome of Phase 1 and Phase 2 of the proposed clockwise task migration technique is shown in Figures 6.As shows in Figure 1, a 64 tiles many-core CMPs architecture with multithreaded workloads is used to proceed the proposed clockwise task migration technique.a) Platform SetupIn order to validate the efficiency the many-core CMPs architecture in this paper, authors use the traffic traces extracted from GEM5 15 full-system simulator to setup the basic system platform. The area of cores and cache banks are estimated by CACTI 21 and McPAT 20. We use multithread applications from secpar benchmarks 14 in our experimental evaluation. The detailed system physique are given in hold over 1. For this benchmarks, one billion operating instructions are executed for the simlarge input set starting from the Region of gratify (ROI). HotSpot 17 version 5.0 is employed as a grid-based thermal modelling tool for chip temperature estimation. For experimental evaluation, maximum temperature limit and dark atomic number 14 peak power budget, Tmax and Pbudget is assumed to be 80 and 100 W, respectively.Table 1. Specification of the target CMP architecture.ComponentDescriptionNumber of Cores64, 8-8 meshCore ConfigurationAlpha21164, 3GHz, 65nmPrivate Cache per each CoreSRAM, 4 way, 32 line, size 32KB per coreOn-chip MemoryBaseline Static random roleProposed Proposed migration techniqueb) Experimental ResultsIn this sub-section, we evaluate a many core CMPs in two different cases. First, the many core CMPs without any migration indemnity (Baseline), and the many core CMPs with the proposed clockwise migration policy (Proposed).Figure 7 shows the results of normalized throu ghput for parsec and SPEC workloads, where throughput is the number of executed instructions per second (IPS). As shown in Figure 7, the Proposed architecture yields on average 31% throughput improvement compared with the Baseline. Moreover, Figure 8 illustrates the results of normalized energy consumption for PARSEC and SPEC workloads. As shown in Figure 8, the Proposed architecture yields on average 69% energy consumption improvement compared with the Baseline. In addition, Figure 9 (a) and (b) show the results of temperature distribution for canneal from PARSEC workloads for Baseline and Proposed architecture, respectively.Also, as shown in figure 9 (a), after applying the proposed clockwise task migration technique (Proposed), it ensures that all cores on the many core CMPs are below the maximum temperature of 80 . While the Baseline spends up to 19% of time above the maximum temperature which presences hotspots as shown in figure 9 (b). In other words, by applying the proposed clockwise task migration technique on the proposed many core CMPs architecture, it distributes the temperature and without appearance of hotspots.Fig.7. Comparison results of IPC.Fig.8. Comparison results of energy consumption.The many-core CMPs provide higher system performance, more flexibility and scalability. Since these advantages require increased power consumption in the system, peak temperature issues become disquieting. Thus, Runtime Thermal Management (RTM) of many-core CMPs becomes crucial in minimizing thermal hotspots without any performance degradation. In this paper, the proposed clockwise task migration technique migrates the heavy loaded task from central cores part to the surrounding cores part. Thy system gathers cores temperature by using performance-counters that are placed in each core instead of use thermal sensors. Since cores with higher power consumption lead to execute higher tasks performance, therefore creates higher temperature. Experimental results of the 64 tiles many-core CMPs have shown signicant improvement of the average for normalized IPC throughput and energy consumption. While the many-core CMPs architecture yields on average 31% throughput improvement compared without preceding the using technique. Moreover, the Proposed architecture yields on average 69% energy consumption improvement compared without using the proposed technique. Furthermore, results also have clarified that up to 15% signicant decline of temperature threshold, and all tiles are below the maximum temperature limit which is 80 on the 64 tiles many-core CMPs(a)(b)Fig.9. Comparison results of temperature.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.