Grapheme-to-phoneme conversion aims to transform written forms into phonetic representations, holding significant application value in fields like speech synthesis and speech recognition. In recent years, methods based on pre-training paradigms and transfer learning frameworks have shown remarkable advantages in areas like low-resource language modeling and multilingual joint modeling. First, the historical development of G2P research is examined, analyzing the paradigm shift from early rule-based models to contemporary neural network models through three dimensions: interpretability modeling, mapping accuracy, and computational efficiency. Next , a horizontal comparison of state-of-the-art G2P methods based on attention mechanisms and multi-task joint learning is presented , highlighting the mapping accuracy of different models on the same public dataset.Then,the research hotspots in this field are systematically reviewed, and a theoretical development path is constructed based on technological evolution.Finally,three future research directions are proposed: integrating multimodal technologies, neural architecture search, and prompt learning paradigms, providing theoretical references to overcome existing technical bottlenecks.