This paper proposes a 256-bit speed-area-efficient hardware elliptic curve point-multiplication engine (ECPM-engine) in GF(p) over generic Weierstrass curves, which is optimized by a new speed-area-efficient radix-64 Montgomery modular multiplication (R64MMM) and a novel Montgomery ladder scheduling. The R64MMM calls one 129-bit adder and one (64x64+129)-bit multiply-accumulator (64-129-MAC) in parallel to make a trade-off between speed and area. The novel Montgomery ladder scheduling is used to improve the utilization of MAC in ECPM operation. In this ECPM-engine, both MAC utilization in R64MMM operations and R64MMM utilization in ECPM operations are close to 100%. The result shows that the proposed ECPM-engine consumes 72k gates when the clock frequency is 714 MHz with a 90 nm standard cell library, and it computes one 256-bit ECPM in 0.14 ms.