Internet of Things (IoT) services will use machine learning tools to efficiently analyze various types of data collected by IoT devices for inference, autonomy, and control purposes. However, due to resource constraints and privacy challenges, edge IoT devices may not be able to transmit their collected data to a central controller for training machine learning models. To overcome this challenge, federated learning (FL) has been proposed as a means for enabling edge devices to train a shared machine learning model without data exchanges thus reducing communication overhead and preserving data privacy. However, Google's seminal FL algorithm requires all devices to be directly connected with a central controller, which significantly limits its application scenarios. In this context, this paper introduces a novel FL framework, called collaborative FL (CFL), which enables edge devices to implement FL with less reliance on a central controller. The fundamentals of this framework are developed and then, a number of communication techniques are proposed so as to improve the performance of CFL. To this end, an overview of centralized learning, Google's seminal FL, and CFL is first presented. For each type of learning, the basic architecture as well as its advantages, drawbacks, and usage conditions are introduced. Then, three CFL performance metrics are presented and a suite of communication techniques ranging from network formation, device scheduling, mobility management, and coding is introduced to optimize the performance of CFL. For each technique, future research opportunities are also discussed. In a nutshell, this article will showcase how the proposed CFL framework can be effectively implemented at the edge of large-scale wireless systems such as the Internet of Things.