As social media have become an integral part of many people’s everyday life, there has been an increasing interest in exploring how the content shared through those online platforms comes to contribute to the collaborative creation of places in physical space. Indeed, the distinction between online and physical spaces and activities is rapidly degrading. However, exploring those digital geographies is a complex task due to the quantity and variety of data. In this paper, we introduce a semi-supervised, deep neural network approach to classify geolocated social media posts based on their text content, media content, and geographic location, using a limited set of arbitrary categories. Our approach combines a stacked multi-modal autoencoder neural network to create joint representations of text and images, and graph convolution neural network for semi-supervised classification. The results presented in this paper show that our approach performs the classification of social media content with higher accuracy than a traditional Support Vector Machine model. Thus, the presented approach has the potential to develop into a powerful tool to complement content analysis in the study of digital geographies.