worm-scraper

Scrapes the web serial Worm and its sequel Ward into an eBook format

OTHER License

Downloads
193
Stars
190
Committers
8

Bot releases are hidden (Show)

worm-scraper - 6.0.0 Latest Release

Published by domenic 3 months ago

This release now requires Node.js v20.x.

It includes various spot fixes for Worm, spotted by @ChatBotMatt in #41 and fixed by @vardrop in #42.

There are also the usual under-the-hood dependency updates.

worm-scraper - 5.3.0

Published by domenic about 2 years ago

This release fixes a couple of instances of missing capitalization for "Number Man", one of which was noticed by @LordXamon in #38.

worm-scraper - 5.2.0

Published by domenic over 2 years ago

This release contains a small fix to Worm Interlude 2, due to @atbenedict in #37.

It also contains under-the-hood dependency updates.

worm-scraper - 5.1.0

Published by domenic over 2 years ago

This release contains various spot fixes for Worm. They are due to @s-arambillete in #35.

worm-scraper - 5.0.0

Published by domenic over 2 years ago

This release now requires Node.js v16. (Previously using Node.js v16+ would give an error.)

Otherwise it only includes under-the-hood dependency updates, with no impact on the produced EPUBs.

worm-scraper - 4.12.1

Published by domenic almost 4 years ago

This release fixes some problems trying to scrape Ward due to recent parahumans.net website updates. No actual source text appears to have changed, but some minor coding changes to the website broke the scraper temporarily.

worm-scraper - 4.12.0

Published by domenic almost 4 years ago

This release's fixes are due to contributions from @CrackedP0t in #24, on their Worm read-through.

General content fixes, applied to both Worm and Ward:

  • Fixed misspellings of "scot-free".
  • Fixed misspellings of "changed tack".
  • Fixed various situations where spaces were inadvertently inserted, including in the middle of words.

Worm-specific content fixes:

  • Spot fixes for the whole book.
worm-scraper - 4.11.0

Published by domenic almost 4 years ago

This release focuses on fixes to the produced EPUBs to make them valid. Previously certain chapters, or parts of the metadata, were broken, which would lead to either those chapters or the entire book being unreadable, depending on the e-reader software in question.

Thanks to @Es7evam for pointing these issues out, in #22!

  • Fixed invalid XHTML in Worm Interlude 8 (Bonus), Worm Cockroaches 28.2, and Ward Sundown 17.6.
  • Replaced <img>s referencing a WordPress-hosted skeptical face with a textual 🤨 emoji.
  • Fixed various issues with the covers' metadata and XHTML.
  • Removed the compression of the mimetype file in the generated EPUB, since apparently that's disallowed per the EPUB spec.
worm-scraper - 4.10.0

Published by domenic almost 4 years ago

Program improvements:

  • Added a new cover for Worm.
  • Changed the author metadata of the produced EPUBs from "wildbow" to "Wildbow".
  • Added cover credits in this repository.
  • Allowed running the scraper with no commands, i.e. now you can type just worm-scraper instead of worm-scraper download convert scaffold zip.
  • Slightly improved the output of various steps. The download step in particular is now less verbose.
  • Updated under-the-hood dependencies.
worm-scraper - 4.9.0

Published by domenic almost 4 years ago

General content fixes, applied to both Worm and Ward:

  • Hyphenated self-preservation, vat-grown, shell-shocked, a just-in-case, dog-tired, one-sided, medium-sized, teary-eyed, and worst-case scenario.
  • Hyphenated second-guess, built-in, face-to-face, and fight-or-flight when appropriate.
  • Fixed hyphenation code to work even when the phrase is capitalized.
  • Removed over-capitalization of judo, aikido, and tae kwon do.
  • De-italicized some commas when two words are italicized in a row.

Ward-specific content fixes:

  • Spot fixes through Last 20.end (i.e., the end of the book).
  • Always capitalize "Uncle Neil" and "Aunt Fleur".
  • De-capitalized "flock" since lowercase was much more prevalent.
  • De-capitalized "giants", but always capitalized "Mathers Giant", "Mother Giant", and "Goddess Giant".
worm-scraper - 4.8.0

Published by domenic almost 4 years ago

General content fixes, applied to both Worm and Ward:

  • Hyphenated self-esteem, self-loathing, self-harm, level-headed, and clear-cut.
  • Hyphenated hand-to-hand when used as an adjective.
  • Removed hyphens from "high five" and "fist bump", except where they were used as verbs.
  • Removed over-capitalization of "parahumans".
  • Removed more non-breaking spaces, which would manifest as lines wrapping strangely or as sentences followed by too many spaces.

Ward-specific content fixes:

  • Spot fixes through Infrared 19.4.
  • Standardized on always capitalizing "Titan" and "Titans" after Sundown 17.y (and before when used as part of a name, e.g. "Kronos Titan").
  • Fixed capitalization of "Stranger Titan" and "the Stranger"; they were being erroneously de-capitalized by the existing PRT designation de-capitalization code.
  • Fixed misspellings of "Tattletale" as "Tatteltale".
  • Always capitalize the "Aunt" in "Aunt Sarah".
  • Always capitalize "Fragile One".
  • Always capitalize "Machine Army".
worm-scraper - 4.7.0

Published by domenic almost 4 years ago

General content fixes, applied to both Worm and Ward:

  • Standardized on "Jotun", replacing some instances of "Jotunn".
  • Standardized on "Juliette", replacing some instances of "Juliet".
  • Standardized on "Dragon-craft" and "Dragon-mech", replacing some instances that were missing the hyphen or capitalization.
  • Standardized on "A.I." instead of "AI".
  • Fixed the possessive of Marquis.
  • Hyphenated spelled-out numbers from 21 through 99.
  • Hyphenated "self-conscious" and derivatives.
  • Hyphenated compound words ending in "-haired".
  • Hyphenated compound words ending in "-dimensional".
  • Hyphenated "on-on-one".
  • Hyphenated "day-to-day" when appropriate.
  • Removed hyphenation around "hundred" and "percent".
  • Always capitalize "Nazi" and "English".
  • Capitalized more instances of "Earth" when referring to the planet.
  • Removed over-capitalization of "english muffin" and "french toast".
  • Removed over-capitalization of "church".
  • Removed over-capitalization of "corona pollentia", "radiata", and "gemma".
  • Fixed capitalization and apostrophes for "’Cage" (as in Birdcage).
  • Fixed a few incomplete or misplaced ellipses.
  • Fixed end-of-line commas that should be periods.
  • Removed extra spaces before closing quotation marks.
  • Removed italicization from commas.
  • Italicized question marks in single-question-word sentences.
  • Fixed various instances of opening quotes being used where it should be an apostrophe.
  • Replaced some hyphen-minuses with em dashes when they preceded a question mark.

Ward-specific content fixes:

  • Spot fixes through From Within 16.z.
  • Standardized on "Crock o’ Shit", replacing some instances of "Crock o’Shit" or "Croc o’Shit".
  • Fixed the possessive of Semiramis.
  • Always capitalize "Dauntless Titan"
  • Always capitalize "the Bunker".
  • Always capitalize "U-turn".
  • Fixed capitalization and apostrophes for "’Lace" (as in Anelace).
  • Removed over-capitalization of "aunt" and "uncle".
  • Removed over-capitalization of season names.
  • Removed over-capitalization of "math".
  • Un-did conversion of color-based team names (like "Team Green-Black") to en dashes; they're more like compound adjectives, so we restore their hyphen-minuses.
worm-scraper - 4.6.1

Published by domenic almost 4 years ago

Fixed a problem introduced in v4.6.0 where chapters with text conversations (example: Ward Heavens 12.f) would end up with ill-formed XML, causing at least some eBook readers to fail to display them.

worm-scraper -

Published by domenic almost 4 years ago

Program improvements:

  • Improved the conversion step's resilience in the face of busy filesystems (e.g. due to antivirus scanners).
  • The conversion step now displays the amount of time it takes.

General content fixes, applied to both Worm and Ward:

  • Standardized on the style "Case Fifty-Three", instead of the other styles used (which include, but are not limited to, "Case-53", "case 53", "case-fifty-three", "Case Fifty Three", ...). Similarly for other PRT cases.
  • Standardized on "Mrs. Yamada", replacing many instances of "Ms."
  • Removed over-capitalization of "university".
  • Fixed a variety of missing periods between sentences.
  • Fixed many erroneously-italicized exclamation points, closing quotation marks, and commas.
  • Fixed backward closing single quotation marks.
  • Fixed a large variety of incorrect closing double quotation marks.

Ward-specific content fixes:

  • Spot fixes through Heavens 12.4.
  • Standardized on "Amias", replacing a few instances of "Amais".
  • Capitalized "Heartbroken" when it's used as a proper noun.
  • Fixed a couple instances of "Hardboil" to be "Hard Boil"
  • Fixed instances of "Warden’s" which should be "Wardens’"
  • Always lowercased "headquarters" when talking about "the Wardens’ headquarters".
  • Fixed the capitalization and apostrophe placement for the truncated names "’Piece", "’Joint", and "’Tend"
  • Fixed double periods.
  • Fixed inconsistently-bolded colons when denoting text conversation senders.
worm-scraper - 4.5.0

Published by domenic almost 4 years ago

Program improvements:

  • Fixed the output of incorrect text replacement warnings to no longer get garbled with the progress bar. (You generally won't see this, unless you are developing this scraper, or if wildbow updates the source text.)

General content fixes, applied to both Worm and Ward:

  • Restored the author's original scene breaks, which were "â– " for Worm, and either "⊙", "☽", or "⊙ ⊙ ⊙ ⊙ ⊙" for Ward. (One instance of "⊙⊙" was assumed erroneous and corrected to a single "⊙".) Previously "â– " and "⊙" were being replaced with horizontal rules.
  • Changed a few instances of "T-shirt" to "t-shirt"; the latter is overwhelmingly more common.
  • Fixed missing spaces after commas.
  • Fixed a variety of misplaced, backward, or extraneous quotation marks.
  • Correctly hyphenated "X-year-old"; usually it was "X year old", but all of "X-year-old", "X year-old", and "X-year old" also appeared.

Ward-specific content fixes:

  • Spot fixes through Polarize 10.x.
  • Fixed a few instances where "the Pharmacist" was not capitalized, even after transitioning from a profession to a cape name.
worm-scraper - 4.4.0

Published by domenic almost 4 years ago

General content fixes, applied to both Worm and Ward:

  • Always capitalize "Earths" when referring to other worlds.

Worm-specific content fixes:

  • Always capitalize "The Clairvoyant". The previous fixup pass missed cases where his name started a sentence.

Ward-specific content fixes:

  • Spot fixes through Beacon 8.12.
  • Always capitalize "the Megalopolis".
  • Fixed instances of a hyphen-minus followed by a comma (i.e. -,), mostly replacing them with em dashes.
worm-scraper - 4.3.0

Published by domenic almost 4 years ago

Program improvements:

  • Used parallelism during the conversion process, so it should be significantly faster on multi-core computers (i.e. all modern computers).
  • Introduced a progress bar during the conversion process, instead of printing out each chapter filename as it is converted.
  • Removed an unnecessary dependency (xmlserializer) and updated other dependencies.

General content fixes, applied to both Worm and Ward:

  • Removed the capitalization of the "master" PRT designation, like all the others.
  • Fixed a few instances of "P.R.T." to be "PRT", which is overwhelmingly more common.
  • Fixed the common misspelling of "shoulder blade" as "shoulderblade".
  • Fixed the common misspelling of "preemptive(ly)" as "pre-emptive(ly)".
  • Fixed a variety of uncapitalized sentences.
  • Fixed more cases of extra spaces after sentences.
  • Fixed missing commas and periods at the end of quotations.
  • Fixed various dash issues.
  • Fixed Ward's indentation (for blockquote-type paragraphs) to match Worm's, at 30px, instead of mostly being 40px but sometimes 30px.

Ward-specific content fixes:

  • Spot fixes through Torch 7.x.
  • Fixed a few places where the letters "tv" were over-capitalized by the converter, e.g. in "outvoted".
  • Ensured all train station names are capitalized (e.g. "Norwalk station" became "Norwalk Station").
  • Settled on lowercasing "kiss and kill"; it was inconsistent.
  • Ensured that all instances of "Patrol", when discussing the proper noun and its derivatives, are capitalized. This partially reverses the change made in v4.2.0 to standardize on "patrol block".
  • Replaced the hyphen-minus with an en dash for another joint name, G–N.
  • Fixed missing periods at the end of sentences.
worm-scraper - 4.2.0

Published by domenic about 4 years ago

General fixes, applied to both Worm and Ward:

  • Capitalized "Dad" and "Mom" when used as names.
  • Capitalized a couple of "Birdcage" instances.
  • Added a hyphen to all instances of "able-bodied".
  • Fixed various erroneous repeated words.
  • Fixed extra spaces after periods.
  • Fixed missing spaces after periods.
  • Fixed some hyphen-minuses that should be em dashes, at the beginning of italicized quotes.
  • Converted hyphen-minuses to en dashes for joint names.
  • Fixed some periods that were over-italicized. (This isn't noticeable, but it bugs me.)

Worm-specific fixes:

  • Fixed the possessive of "Chuckles" to be "Chuckles’s" instead of "Chuckles’"

Ward-specific fixes:

  • Standardized on "patrol block" instead of "Patrol block" or "Patrol Block". All three were in use, but the former was most widely used.
  • Fixed the over-capitalization of "clairvoyants", which was introduced by worm-scraper for Worm but backfired for Ward.
  • Fixed some instances of "Cedar point" and "Hollow point" to be "Cedar Point" and "Hollow Point", respectively.
  • Fixed some instances of "Resound" to be "ReSound".
  • Fixed the possessive of various new-in-Ward names ending with "s".
  • Spot fixes through Shadow 5.8.
worm-scraper - 4.1.0

Published by domenic about 4 years ago

Fixed all of the "[email protected]" instances which showed up in Ward.

Fixed some misplaced bolding in Ward.

Spot fixes for Ward arcs 1 and 2.

worm-scraper - 4.0.1

Published by domenic about 4 years ago

Fixed the minimum Node.js version described in the README and in the package.json to be v12.10.0. This minimum version requirement has been in place since v3.0.0 of worm-scraper, but previously those places erroneously noted the minimum as v10.0.0.