Bot releases are visible (Hide)
Published by EliotJones about 5 years ago
This release fixes a major regression in 0.0.7 which broke consuming documents via streams. It also adds new features:
Docstrum
(Doc Spectrum) algorithm for page segmentation.Docstrum
and RecursiveXYCut
) implement the IPageSegmenter
interface which now returns a list of TextBlock
s. XYLeaf
and XYNode
are now internal.TextEdgesExtractor
is a new class which can be used to detect shared alignment in sections of text.Color
property. This is one of the types implementing IColor
. These are GrayColor
, RGBColor
and CMYKColor
, other color spaces are not currently supported and default to GrayColor.Black
.PdfDocument
now has a TryGetXmpMetadata(out XmpMetadata metadata)
method which will retrieve the XML XMP Metadata object from the document if one is present.Published by EliotJones about 5 years ago
This release primarily focuses on more bug-fixing to improve stability of extracting text content. The main new features are full support for encrypted documents, Document Layout Analysis tools and early-access path information.
DefaultWordExtractor
where the Letters
collection on all words would be empty.HexToken
based strings.DocumentLayoutAnalysis
namespace supports nearest-neighbour word extraction and recursive X-Y cut document segmentation. RecursiveXYCut.GetBlocks
implements the Recursive X-Y cut algorithm https://en.wikipedia.org/wiki/Recursive_X-Y_cut. NearestNeighbourWordExtractor
can be provided to Page.GetWords
for a different word extraction technique.%%EOF
end of file marker.Page
now contains a Rotation
property indicating if the page is rotated at the top level. Valid values for rotation are 0, 90, 180 and 270. The currently reported PageSize
does not take rotation into account yet. This also adds support for properly rotating letters and page content.Page.ExperimentalAccess.GetPointSize(Letter letter)
now reports the point size with an updated calculation which handles rotated letters.PdfPath
information from the page's content stream. Early access to path/geometry information parsed from the page's content. Use Page.ExperimentalAccess.Paths
to access lines, rectangles, curves, etc declared by the page.Published by EliotJones over 5 years ago
This release focuses on stability improvements and has been tested on far more document types than previous releases. The 2 main new features are support for full framework versions of .NET back to .NET 4.5 making this library available to more users and initial support for encrypted documents using the most basic form of document encryption.
The release may contain a bug in System Font loading which has not been replicated but may make the library crash on some systems. Please file a bug report if you encounter an error on this package version.
page.Operations
.PdfPageBuilder
directly using builder.Advanced.Operations
.ParsingOptions
.Published by EliotJones almost 6 years ago
Adds new document creation and provides access to per-page annotations.
Published by EliotJones almost 6 years ago
Published by EliotJones over 6 years ago
The first non pre-release version.
Published by EliotJones over 6 years ago
Fixes an issue where the only encoding present is embedded in the font program.
Supports reading from streams.
Published by EliotJones almost 7 years ago
The initial alpha release