Summaries generated from medical conversations can improve recall and understanding of care plans for patients and reduce documentation burden for doctors. Recent advancements in automatic speech recognition (ASR) and natural language understanding (NLU)offer potential solutions to generate these summaries automatically. In the current paper, we focus on two tasks: classifying utterances from medical conversations according to (i)the SOAP section and (ii) the speaker role, both fundamental building blocks along the path towards an end-to-end, automated SOAP note for medical conversations. We provide details on a dataset that contains human and ASR transcriptions of medical conversations and corresponding machine learning optimized SOAP notes. We then present a systematic and rigorous analysis in which we adapt an existing deep learning architecture to the two aforementioned tasks. The results suggest that modelling context in a hierarchical manner, which captures both word and utterance level context, yields substantial improvements on both classification tasks. Additionally, we develop and analyze a modular method for adapting our model to ASR output. Our work fills an important gap by providing a quantitative baseline for benchmarking future research on the automation of SOAP notes.We discuss its implications for future research on using deep learning to automate clinical documentation from medical conversations.