Where does the HTML come from in the dataset?

July 08, 2024 21:50
Updated

The HTML included is translated from Wikitext (the original markup of Wikimedia projects) and optimized for parsing. It is created as part of the Wikimedia Parsoid project. More information on the data source can be found in these DOM specs.

Related articles