skip to main content
Lingue:

Towards a HPC-oriented parallel implementation of a learning algorithm for bioinformatics applications

D'Angelo, Gianni ; Rampone, Salvatore

BMC bioinformatics, 2014, Vol.15 Suppl 5, pp.S2 [Rivista Peer Reviewed]

Fulltext disponibile

Citazioni Citato da
  • Titolo:
    Towards a HPC-oriented parallel implementation of a learning algorithm for bioinformatics applications
  • Autore: D'Angelo, Gianni ; Rampone, Salvatore
  • Note di contenuto: The huge quantity of data produced in Biomedical research needs sophisticated algorithmic methodologies for its storage, analysis, and processing. High Performance Computing (HPC) appears as a magic bullet in this challenge. However, several hard to solve parallelization and load balancing problems arise in this context. Here we discuss the HPC-oriented implementation of a general purpose learning algorithm, originally conceived for DNA analysis and recently extended to treat uncertainty on data (U-BRAIN). The U-BRAIN algorithm is a learning algorithm that finds a Boolean formula in disjunctive normal form (DNF), of approximately minimum complexity, that is consistent with a set of data (instances) which may have missing bits. The conjunctive terms of the formula are computed in an iterative way by identifying, from the given data, a family of sets of conditions that must be satisfied by all the positive instances and violated by all the negative ones; such conditions allow the computation... We find mathematical and programming solutions able to lead us towards the implementation of the algorithm U-BRAIN on parallel computers. First we give a Dynamic Programming model of the U-BRAIN algorithm, then we minimize the representation of the relevances. When the data are of great size we are forced to use the mass memory, and depending on where the data are actually stored, the access times can be quite different. According to the evaluation of algorithmic efficiency based on the Disk Model, in order to reduce the costs of the communications between different memories (RAM, Cache, Mass, Virtual) and to achieve efficient I/O performance, we design a mass storage structure able to access its data with a high degree of temporal and spatial locality. Then we develop a parallel implementation of the algorithm. We model it as a SPMD system together to a Message-Passing Programming Paradigm. Here, we adopt the high-level message-passing systems MPI (Message Passing Interface) in the version... In the context of a collaboration between public and private institutions, the parallel model of U-BRAIN has been implemented and tested on the INTEL XEON E7xxx and E5xxx family of the CRESCO structure of Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA), developed in the framework of the European Grid Infrastructure (EGI), a series of efforts to provide access to high-throughput computing resources across Europe using grid computing techniques. The implementation is able to minimize both the memory space and the execution time. The test data used in this study are IPDATA (Irvine Primate splice- junction DATA set), a subset of HS3D (Homo Sapiens Splice Sites Dataset) and a subset of COSMIC (the Catalogue of Somatic Mutations in Cancer). The execution time and the speed-up on IPDATA reach the best values within about 90 processors. Then the parallelization advantage is balanced by the greater cost of non-local communications between the processors....
  • Fa parte di: BMC bioinformatics, 2014, Vol.15 Suppl 5, pp.S2
  • Soggetti: Algorithms ; Computing Methodologies ; Computational Biology -- Methods
  • Lingua: Inglese
  • Tipo: Articolo
  • Identificativo: E-ISSN: 1471-2105 ; PMID: 25077818 Version:1 ; DOI: 10.1186/1471-2105-15-S5-S2

Ricerca in corso nelle risorse remote ...