With the recent development in ConvNet-based detectors, a successful solution for vessel detection becomes possible. However, it is essential to access a comprehensive annotated training set from different maritime environments. Creating such a dataset is expensive and time consuming. To automate this process, this paper proposes a novel self learning framework which automatically finetunes a generic pre-trained model to any new environment. With this, the framework enables automated labeling of new dataset types. The method first explores the video frames captured from a new target environment to generate the candidate vessel samples. Afterwards, it exploits a temporal filtering concept to verify the correctly generated candidates as new labels for learning, while removing the false positives. Finally, the system updates the vessel model using the provided self-learning dataset. Experimental results on our real-world evaluation dataset show that generalizing a finetuned Single Shot Detector to a new target domain using the proposed self-learning framework increases the average precision and the F1-score by 12% and 5%, respectively. Additionally, the proposed temporal filter reduced the noisy detections in a sensitive setting from 58% to only 5%.