Current loop buffer organizations for very large instruction word processors are essentially centralized. As a consequence, they are energy inefficient and their scalability is limited. To alleviate this problem, we propose a clustered loop buffer organization, where the loop buffers are partitioned and functional units are logically grouped to form clusters, along with two schemes for buffer control which regulate the activity in each cluster. Furthermore, we propose a design-time scheme to generate clusters by analyzing an application profile and grouping closely related functional units. The simulation results indicate that the energy consumed in the clustered loop buffers is, on average, 63 percent lower than the energy consumed in an uncompressed centralized loop buffer scheme, 35 percent lower than a centralized compressed loop buffer scheme, and 22 percent lower than a randomly clustered loop buffer scheme.