Modern real-time systems often involve numerous computational tasks characterized by intricate dependency relationships. Within these systems, data propagate through cause-effect chains from one task to another, making it imperative to minimize end-to-end latency to ensure system safety and reliability. In this paper, we introduce innovative non-preemptive scheduling techniques designed to reduce the worst-case end-to-end latency and/or time disparity for task sets modeled with directed acyclic graphs (DAGs). This is challenging because of the non-continuous and non-convex characteristics of the objective functions, hindering the direct application of standard optimization frameworks. Customized optimization frameworks aiming at achieving optimal solutions may suffer from scalability issues, while general heuristic algorithms often lack theoretical performance guarantees. To address this challenge, we incorporate the ''1-opt" concept from the optimization literature (Essentially, 1-opt means that the quality of a solution cannot be improved if only one single variable can be changed) into the design of our algorithm. We propose a novel optimization algorithm that effectively balances the trade-off between theoretical guarantees and algorithm scalability. By demonstrating its theoretical performance guarantees, we establish that the algorithm produces 1-opt solutions while maintaining polynomial run-time complexity. Through extensive large-scale experiments, we demonstrate that our algorithm can effectively reduce the latency metrics by 20% to 40%, compared to state-of-the-art methods.