Environmental perception and high-precision positioning are the core research contents of self-driving. These factors help cars obtain accurate real-time positioning information and make correct path planning decisions. However, some problems still exist, including difficulty in recognizing and efficiently realizing complete road mapping, inadequate automation, poor positioning robustness, and so on. To solve these problems, based on the mobile 3D data acquisition system, we proposed a method to improve the accuracy of urban scale 3D point cloud data and fulfill high-precision 3D data acquisition for various complex road conditions with both efficiency and accuracy. Based on the acquired point cloud data, using the deep learning algorithm, we further proposed an end-to-end network for semantic segmentation of large-scale urban scenes, accomplishing the automatic extraction of map elements (curbs, street lamps, road signs, lane lines, etc.), and effectively improved the automatic understanding and perception ability of 3D scenes. Meanwhile, in order to solve the defect of a single sensor in positioning, we developed a multi-sensor fusion positioning system, which improved positioning accuracy, robustness, and hence the quality of acquired data. In addition, based on SLAM (Simultaneous Localization and Mapping) technology, we achieved the integrated acquisition of indoor and outdoor, aboveground and underground, obtaining centimeter-level positioning and mapping with restricted or no GNSS signal.