Essential Site Maintenance: Authorea-powered sites will be updated circa 15:00-17:00 Eastern on Tuesday 5 November.
There should be no interruption to normal services, but please contact us at [email protected] in case you face any issues.

loading page

Scanning HTML at Tens of Gigabytes per Second on ARM Processors
  • Daniel Lemire
Daniel Lemire
Universite TELUQ

Corresponding Author:[email protected]

Author Profile

Abstract

Modern processors have instructions to process 16 bytes or more at once. These instructions are called SIMD, for single instruction, multiple data. Recent advances have leveraged SIMD instructions to accelerate parsing of common Internet formats such as JSON and base64. The two major Web browser engines (WebKit and Blink) have adopted SIMD algorithms for parsing HTML on 64-bit ARM processors. During HTML parsing, they quickly identify specific characters with a strategy called vectorized classification. We review their techniques and compare them with a faster alternative. We measure a 20-fold performance improvement in HTML scanning compared to traditional methods on recent ARM processors. Our findings highlight the potential of SIMD-based algorithms for optimizing Web browser performance.
16 Sep 2024Submitted to Software: Practice and Experience
18 Sep 2024Submission Checks Completed
18 Sep 2024Assigned to Editor
18 Sep 2024Review(s) Completed, Editorial Evaluation Pending
20 Sep 2024Reviewer(s) Assigned
02 Nov 2024Editorial Decision: Revise Minor