A tool to parse recipe ingredients into structured data
MIT License
Bot releases are hidden (Show)
Revert upgrade to NLTK 3.8.2 after 3.8.2 removed to PyPI.
Published by strangetom 2 months ago
Require NLTK >= 3.8.2 due to change in POS tagger weights format.
Published by strangetom 2 months ago
[!WARNING]
This version requires NLTK >=3.8.2
NLTK 3.8.2 changes the file format (from pickle to json) of the weights used by the part of speech tagger used in this project, to address some security concerns. This patch updates the NLTK resource checks performed when ingredient-parser
is imported to check for the new json files, and downloads them if they are not present.
This version requires NLTK>=3.8.2.
Published by strangetom 4 months ago
1 cup plus 1 tablespoon
or 1 cup minus 1 tablespoon
. Previously the phrase plus/minus 1 tablespoon
would be returned in the comment. Now the whole phrase is captured as a CompositeAmount
object.pint.Unit
would be returned, caused by pint interpreting the unit as something else e.g. "pinch" -> "pico-inch".Published by strangetom 5 months ago
Refactor package structure to make it more suitable for expansion to over languages.
Note: There aren't any plans to support other languages yet.
for the dressing
and for garnish
.token.
" and "token
" as different cases to learn.parse_ingredient
for cases where none of the tokens are labelled as NAME. This will select name as the token with the highest confidence of being labelled NAME, even though a different label has a high confidence for that token. This can be disabled by setting expect_name_in_output=False
in parse_ingredient
.Published by strangetom 6 months ago
Fix incorrect python version specifier in package which was preventing pip in Python 3.12 downloading the latest version.
Published by strangetom 7 months ago
Add github actions to run tests (#7, @boxydog)
Add pre-commit for use with development (#10, @boxydog)
python train.py gridsearch
to iterate over specified training algorithms and hyper-parameters.--detailed
argument to output detailed information about model performance on test data. (#9, @boxydog)Integration of pint
library for units
By default, units in IngredientAmount
object will be returned as pint.Unit
objects (where possible). This enables the easy conversion of amounts between different units. This can be disabled by setting string_units=True
in the parse_ingredient
function calls.
For units that have US customary and Imperial version with the same name (e.g, cup), setting imperial_units=True
in the parse_ingredient
function calls will return the imperial version. The default is US customary.
This only applies to units in pint
's unit registry (basically all common, standardised units). If the unit can't be found, then the string is returned as previously.
Additions to IngredientAmount
object:
quantity_max
field for handling upper limit of ranges. If the quantity is not a range, this will default to same as the quantity
field.1-2
1x
float
where possiblePreProcessor improvements
1 tsp Chinese five-spice
, five-spice
is now kept as written instead of being replaced by two tokens: 5 spice
.1 pound to 2 pound
is now returned as 1-2 pound
Published by strangetom 9 months ago
other
field from ParsedIngredient
return from parse_ingredient
function.text
field to IngredientAmount
. This is auto-generated on when the object is created and proves a human readable string for the amount e.g. "100 g"14 ounce (400 g)
, any flags set for one of the related amounts are applied to all the related amountsparse_ingredient
to discard isolated stop words that appear in the name, comment and preparation fields.IngredientAmount.amount
elements are now ordered to match the order in which they appear in the sentence.1 lb 2 oz
is now consider to be a single CompositeIngredientAmount
instead of two separate IngredientAmount
.
1 tablespoon plus 1 teaspoon
.Published by strangetom 11 months ago
Published by strangetom 12 months ago
ParsedIngredient.preparation
field instead of the comment field as previouslyPublished by strangetom about 1 year ago
IngredientText
object containing the text and confidenceIngredientAmount
object containing the quantity, unit, confidence and flags for whether the amount is approximate or for a singular item of the ingredient.Example of the output at this release
>>> parse_ingredient("50ml/2fl oz/3½tbsp lavender honey (or other runny honey if unavailable)")
ParsedIngredient(
name=IngredientText(
text='lavender honey',
confidence=0.998829),
amount=[
IngredientAmount(
quantity='50',
unit='ml',
confidence=0.999189,
APPROXIMATE=False,
SINGULAR=False),
IngredientAmount(
quantity='2',
unit='fl oz',
confidence=0.980392,
APPROXIMATE=False,
SINGULAR=False),
IngredientAmount(
quantity='3.5',
unit='tbsps',
confidence=0.990711,
APPROXIMATE=False,
SINGULAR=False)
],
comment=IngredientText(
text='(or other runny honey if unavailable)',
confidence=0.973682
),
other=None,
sentence='50ml/2fl oz/3½tbsp lavender honey (or other runny honey if unavailable)'
)
Published by strangetom about 1 year ago
ParsedIngredient
dataclass instead of a dict.
show_model_card()
that will open the model card in the default application for markdown files.As a result of these updates the model performance has improved to:
Sentence-level results:
Total: 12030
Correct: 10776
Incorrect: 1254
-> 89.58% correct
Word-level results:
Total: 75146
Correct: 72329
Incorrect: 2817
-> 96.25% correct
Published by strangetom over 1 year ago
Correct minimum python version to 3.10 due to use of type hints introduced in 3.10.
Published by strangetom over 1 year ago
tsp.
becomes tsp
Published by strangetom over 1 year ago
Published by strangetom almost 2 years ago
Published by strangetom about 2 years ago
Incremental changes:
Published by strangetom about 2 years ago
Incremental changes:
Published by strangetom about 2 years ago
Initial release of package.
There are probably a bunch of errors to fix and improvements to make since this is my first attempt and building a python package.