Unprecedented high volumes of data are becoming available with the growth of the advanced metering infrastructure. These are expected to benefit planning and operation of the future power systems and to help customers transition from a passive to an active role. In this paper, we explore for the first time in the smart grid context the benefits of using deep reinforcement learning, a hybrid type of methods that combines reinforcement learning with deep learning, to perform on-line optimization of schedules for building energy management systems. The learning procedure was explored using two methods, Deep Q-learning and deep policy gradient, both of which have been extended to perform multiple actions simultaneously. The proposed approach was validated on the large-scale Pecan Street Inc. database. This highly dimensional database includes information about photovoltaic power generation, electric vehicles and buildings appliances. Moreover, these on-line energy scheduling strategies could be used to provide real-time feedback to consumers to encourage more efficient use of electricity.