Many new embedded applications require complex computations to be performed to tight schedules, while at the same time demanding low energy consumption and low cost. For implementation of these highly-demanding applications, highly-optimized application-specific multi-processor system-on-a-chip (MPSoCs) are required involving hardware multi-processors to execute the critical computations. The multi-processor accelerator design for such applications has to adequately resolve several difficult issues. Since the processors' micro- and macro-architectures, as well as, the memory and communication architectures are strongly interrelated, they have to be designed in combination. Complex mutual tradeoffs have to be resolved among the processor micro- and macro-architecture, and the corresponding memory and communication architectures, as well as, among the performance, power consumption and area. Unfortunately, the design methods and tools published till now do not address most of the design issues of the massively parallel hardware multi-processor accelerators. This paper discusses our novel quality-driven model-based multi-processor accelerator design method that adequately addresses the architecture design issues of hardware multi-processors for the modern highly-demanding embedded applications. Using the design of LDPC decoders for the latest high-speed communication system standards as an example application, we performed an extensive experimental research of the multi-processor design issues, and of our method and its design space exploration (DSE) framework. The experiments clearly demonstrated the existence of various complex architecture tradeoffs that could only be resolved through an adequate quality-driven combined design space exploration of the processors' micro- and macro-architectures, and the corresponding memory and communication architectures, as delivered by our method. (C) 2013 Elsevier B.V. All rights reserved.