Drug-target affinity (DTA) prediction is crucial for understanding molecular interactions and aiding drug discovery and development. While various computational methods have been proposed for DTA prediction, their predictive accuracy remains limited, failing to delve into the structural nuances of interactions. With increasingly accurate and accessible structure prediction of targets, we developed a novel deep learning model, named S2DTA, to accurately predict DTA by fusing sequence and structural knowledge of drugs, targets, and pockets using heterogeneous models based on graph and semantic networks. Experimental findings underscored that complex feature representations imparted negligible enhancements to the model’s performance. However, the integration of heterogeneous models demonstrably bolstered predictive accuracy. In comparison to three state-of-the-art methodologies, the supremacy of S2DTA became strikingly apparent. It showcased a noteworthy 25.2% reduction in Mean Absolute Error (MAE) and an impressive 20.1% decrease in Root Mean Square Error (RMSE). Furthermore, S2DTA exhibited substantial advancements in other pivotal metrics, including Pearson Correlation Coefficient (PCC), Spearman, Concordance Index (CI), and R2. These metrics experienced remarkable increments of at least 19.6%, 17.5%, 8.1%, and a remarkable 49.4%, respectively. Finally, we conducted interpretability analysis on the effectiveness of S2DTA by bidirectional self-attention mechanism, fully proving that S2DTA is a valuable and accurate tool for predicting DTA. For further exploration, the source data and code repository can be accessed at https://github.com/dldxzx/S2DTA.