Background. Portfolio learning enables students to collect evidence of their learning. Component tasks making up a portfolio can be devised that relate directly to intended learning outcomes. Reflective tasks can stimulate students to recognise their own learning needs. Assessment of portfolios using a rating scale relating to intended learning outcomes offers high content validity. This study evaluated a reflective portfolio used during a final-year attachment in general practice (family medicine). Students were asked to evaluate the portfolio (which used significant event analysis as a basis for reflection) as a learning tool. The validity and reliability of the portfolio as an assessment tool were also measured. Methods. 81 final-year medical students completed reflective significant event analyses as part of a portfolio created during a three-week attachment (clerkship) in general practice (family medicine). As well as two reflective significant event analyses each portfolio contained an audit and a health needs assessment. Portfolios were marked three times; by the student's GP teacher, the course organiser and by another teacher in the university department of general practice. Inter-rater reliability between pairs of markers was calculated. A questionnaire enabled the students' experience of portfolio learning to be determined. Results. Benefits to learning from reflective learning were limited. Students said that they thought more about the patients they wrote up in significant event analyses but information as to the nature and effect of this was not forthcoming. Moderate inter-rater reliability (Spearman's Rho.65) was found between pairs of departmental raters dealing with larger numbers (20 - 60) of portfolios. Inter-rater reliability of marking involving GP tutors who only marked 1 - 3 portfolios was very low. Students rated highly their mentoring relationship with their GP teacher but found the portfolio tasks time-consuming. Conclusion. The inter-rater reliability observed in this study should be viewed alongside the high validity afforded by the authenticity of the learning tasks (compared with a sample of a student's learning taken by an exam question). Validity is enhanced by the rating scale which directly connects the grade given with intended learning outcomes. The moderate inter-rater reliability may be increased if a portfolio is completed over a longer period of time and contains more component pieces of work. The questionnaire used in this study only accessed limited information about the effect of reflection on students' learning. Qualitative methods of evaluation would determine the students experience in greater depth. It would be useful to evaluate the effects of reflective learning after students have had more time to get used to this unfamiliar method of learning and to overcome any problems in understanding the task.