Torchmetrics - Machine learning metrics for distributed, scalable PyTorch applications.
APACHE-2.0 License
Bot releases are hidden (Show)
Published by Borda 5 months ago
In Torchmetrics v1.4, we are happy to introduce a new domain of metrics to the library: segmentation metrics. Segmentation metrics are used to evaluate how well segmentation algorithms are performing, e.g., algorithms that take in an image and pixel-by-pixel decide what kind of object it is. These kind of algorithms are necessary in applications such as self driven cars. Segmentations are closely related to classification metrics, but for now, in Torchmetrics, expect the input to be formatted differently; see the documentation for more info. For now, MeanIoU
and GeneralizedDiceScore
have been added to the subpackage, with many more to follow in upcoming releases of Torchmetrics. We are happy to receive any feedback on metrics to add in the future or the user interface for the new segmentation metrics.
Torchmetrics v1.3 adds new metrics to the classification and image subpackage and has multiple bug fixes and other quality-of-life improvements. We refer to the changelog for the complete list of changes.
SensitivityAtSpecificity
metric to classification subpackage (#2217)QualityWithNoReference
metric to image subpackage (#2288)MeanIoU
(#1236)GeneralizedDiceScore
(#1090)PanopticQuality
metric (#2381)pretty-errors
for improving error prints (#2431)torch.float
weighted networks for FID and KID calculations (#2483)zero_division
argument to selected classification metrics (#2198)__getattr__
and __setattr__
of ClasswiseWrapper
more general (#2424)ERGAS
metric (#2498)BootStrapper
wrapper not working with kwargs
provided argument (#2503)MeanAveragePrecision
when requested (#2501)binary_average_precision
when only negative samples are provided (#2507)@baskrahmer, @Borda, @ChristophReich1996, @daniel-code, @furkan-celik, @i-aki-y, @jlcsilva, @NielsRogge, @oguz-hanoglu, @SkafteNicki, @ywchan2005
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.3.0...v1.4.0
Published by Borda 7 months ago
top_k>1
and average="macro"
for classification metrics (#2423)PrecisionRecallCurve.plot
methods (#2437)Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.3.1...v1.3.2
@Borda, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Published by Borda 8 months ago
LPIPS
metric (#2326)MultitaskWrapper
not being able to be logged in lightning when using metric collections (#2349)Perplexity
metric (#2346)FeatureShare
not being moved to the correct device (#2348)MeanAveragePrecision
with custom max det thresholds (#2367)Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.3.0...v1.3.1
@Borda, @fschlatt, @JonasVerbickas, @nsmlzl, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Published by Borda 9 months ago
Published by Borda 9 months ago
Published by Borda 9 months ago
SacreBLEU
metric (#2068)MultiTaskWrapper
directly with lightnings log_dict
method (#2213)FeatureShare
wrapper to share submodules containing feature extractors between metrics (#2120)SpatialDistortionIndex
(#2260)CriticalSuccessIndex
(#2257)Spatial Correlation Coefficient
(#2248)average
argument to multiclass versions of PrecisionRecallCurve
and ROC
(#2084)extended_summary=True
in MeanAveragePrecision
(#2212)RetrievalAUROC
metric (#2251)aggregate
argument to retrieval metrics (#2220)segmentation.utils
for future segmentation metrics (#2105)PrecisionRecallCurve
to be consistent with scikit-learn (#2183)metric._update_called
(#2141)specicity_at_sensitivity
in favour of specificity_at_sensitivity
(#2199)Running
metrics (#2256)FID
metric (#2277)Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.2.0...v1.3.0
@Borda, @HoseinAkbarzadeh, @matsumotosan, @miskfi, @oguz-hanoglu, @SkafteNicki, @stancld, @ywchan2005
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Published by Borda 11 months ago
NoTrainInceptionV3
is being initialized without torch-fidelity
not being installed (#2143)v2.1
(#2142)SpectralAngleMapper
and UniversalImageQualityIndex
to be tensors (#2089)arange
and repeat for deterministic bincount (#2184)lpips
third-party package as dependency of LearnedPerceptualImagePatchSimilarity
metric (#2230)LearnedPerceptualImagePatchSimilarity
metric (#2144)UniversalImageQualityIndex
metric (#2222)MeanAveragePrecision
with pycocotools
backend when too little max_detection_thresholds
are provided (#2219)LearnedPerceptualImagePatchSimilarity
functional metric (#2234)Metric._reduce_states(...)
when using dist_sync_fn="cat"
(#2226)CosineSimilarity
where 2d is expected but 1d input was given (#2241)MetricCollection
when using compute groups and compute
is called more than once (#2211)Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.2.0...v1.2.1
@Borda, @jankng, @kyle-dorman, @SkafteNicki, @tanguymagne
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Published by Borda about 1 year ago
Torchmetrics v1.2 is out now! The latest release includes 11 new metrics within a new subdomain: Clustering.
In this blog post, we briefly explain what clustering is, why it’s a useful measure and newly added metrics that can be used with code samples.
Clustering is an unsupervised learning technique. The term unsupervised here refers to the fact that we do not have ground truth targets as we do in classification. The primary goal of clustering is to discover hidden patterns or structures within data without prior knowledge about the meaning or importance of particular features. Thus, clustering is a form of data exploration compared to supervised learning, where the goal is “just” to predict if a data point belongs to one class.
The key goal of clustering algorithms is to split data into clusters/sets where data points from the same cluster are more similar to each other than any other points from the remaining clusters. Some of the most common and widely used clustering algorithms are K-Means, Hierarchical clustering, and Gaussian Mixture Models (GMM).
An objective quality evaluation/measure is required regardless of the clustering algorithm or internal optimization criterion used. In general, we can divide all clustering metrics into two categories: extrinsic metrics and intrinsic metrics.
Extrinsic metrics are characterized by requirements of some ground truth labeling, even if used for an unsupervised method. This may seem counter-intuitive at first as we, by clustering definition, do not use such ground truth labeling. However, most clustering algorithms are still developed on datasets with labels available, so these metrics use this fact as an advantage.
In contrast, intrinsic metrics do not need any ground truth information. These metrics estimate inter-cluster consistency (cohesion of all points assigned to a single set) compared to other clusters (separation). This is often done by comparing the distance in the embedding space.
MeanAveragePrecision
, the most widely used metric for object detection in computer vision, now supports two new arguments: average
and backend
.
The average
argument controls averaging over multiple classes. By the core definition, the default way is macro
averaging, where the metric is calculated for each class separately and then averaged together. This will continue to be the default in Torchmetrics, but now we also support the setting average="micro"
. Every object under this setting is essentially considered to be the same class, and the returned value is, therefore, calculated simultaneously over all objects.
The second argument - backend
, is important, as it indicates what computational backend will be used for the internal computations. Since MeanAveragePrecision
is not a simple metric to compute, and we value the correctness of our metric, we rely on some third-party library to do the internal computations. By default, we rely on users to have the official pycocotools installed, but with the new argument, we will also be supporting other backends.
MutualInformationScore
(#2008)RandScore
(#2025)NormalizedMutualInfoScore
(#2029)AdjustedRandScore
(#2032)CalinskiHarabaszScore
(#2036)DunnIndex
(#2049)HomogeneityScore
(#2053)CompletenessScore
(#2053)VMeasureScore
(#2053)FowlkesMallowsIndex
(#2066)AdjustedMutualInfoScore
(#2058)DaviesBouldinScore
(#2071)backend
argument to MeanAveragePrecision
(#2034)Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.1.0...v1.2.0
v1.1.0
@matsumotosan, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Published by Borda about 1 year ago
BootStrapper
when very few samples were evaluated that could lead to crash (#2052)RecallAtFixedPrecision
for large batch sizes (#2042)MetricCollection
used with custom metrics have prefix
/postfix
attributes (#2070)@GlavitsBalazs, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Published by Borda about 1 year ago
average
argument to MeanAveragePrecision
(#2018)PearsonCorrCoef
is updated on single samples at a time (#2019)MetricCollection
when used with multiple metrics that return dicts with same keys (#2027)class_metrics=True
resulting in wrong values (#1924)higher_is_better
, is_differentiable
for some metrics (#2028)@adamjstewart, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Published by Borda about 1 year ago
In version v1.1 of Torchmetrics, in total five new metrics have been added, bringing the total number of metrics up to 128! In particular, we have two new exciting metrics for evaluating your favorite generative models for images.
Introduced in the famous StyleGAN paper back in 2018 the Perceptual path length metric is used to quantify how smoothly a generator manages to interpolate between points in its latent space.
Why does the smoothness of the latent space of your generative model matter? Assume you find an image at some point in your latent space that generates an image you like, but you would like to see if you could find a better one if you slightly change the latent point it was generated from. If your latent space could be smoother, this because very hard because even small changes to the latent point can lead to large changes in the generated image.
CLIP image quality assessment (CLIPIQA) is a very recently proposed metric in this paper. The metrics build on the OpenAI CLIP model, which is a multi-modal model for connecting text and images. The core idea behind the metric is that different properties of an image can be assessed by measuring how similar the CLIP embedding of the image is to the respective CLIP embedding of a positive and negative prompt for that given property.
VisualInformationFidelity
has been added to the image package. The first proposed in this paper can be used to automatically assess the quality of images in a perceptual manner.
EditDistance
have been added to the text package. A very classical metric for text that simply measures the amount of characters that need to be substituted, inserted, or deleted, to transform the predicted text into the reference text.
SourceAggregatedSignalDistortionRatio
has been added to the audio package. Metric was originally proposed in this paper and is an improvement over the classical Signal-to-Distortion Ratio (SDR) metric (also found in torchmetrics) that provides more stable gradients during training when trying to train models for style source separation.
VisualInformationFidelity
to image package (#1830)EditDistance
to text package (#1906)top_k
argument to RetrievalMRR
in retrieval package (#1961)"segm"
and "bbox"
detection in MeanAveragePrecision
at the same time (#1928)PerceptualPathLength
to image package (#1939)MeanSquaredError
(#1937)extended_summary
to MeanAveragePrecision
such that precision, recall, iou can be easily returned (#1983)ClipScore
if long captions are detected and truncate (#2001)CLIPImageQualityAssessment
to multimodal package (#1931)metric_state
to all metrics for users to investigate currently stored tensors in memory (#2006)Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.0.0...v1.1.0
v1.0.0
@bojobo, @lucadiliello, @quancs, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Published by Borda about 1 year ago
MeanAveragePrecision
if too many detections are observed (#1978)multidim_average="samplewise"
in classification metrics (#1977)@borda, @SkafteNicki^n
, @Vivswan
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Published by Borda about 1 year ago
PearsonCorrCoeff
if input has a very small variance for its given dtype (#1926)Metric
(#1963)CalibrationError
where calculations for double precision input was performed in float precision (#1919)prefix/postfix
arguments in MetricCollection
and ClasswiseWrapper
being duplicated (#1918)score
argument (#1948)@borda, @SkafteNicki^n
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Published by Borda over 1 year ago
MetricCollection
together with aggregation metrics (#1896)max_fpr
in AUROC
metric when only one class is present (#1895)IntersectionOverUnion
metric (#1892)MeanMetric
and broadcasting of weights when Nans are present (#1898)MeanAveragePrecision
(#1913)@fansuregrin, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Published by Borda over 1 year ago
We are happy to announce that the first major release of Torchmetrics, version v1.0, is publicly available. We have
worked hard on a couple of new features for this milestone release, but for v1.0.0, we have also managed to implement
over 100 metrics in torchmetrics
.
The big new feature of v1.0 is a built-in plotting feature. As the old saying goes: "A picture is worth a thousand words". Within machine learning, this is definitely also true for many things.
Metrics are one area that, in some cases, is definitely better showcased in a figure than as a list of floats. The only requirement for getting started with the plotting feature is installing matplotlib
. Either install with pip install matplotlib
or pip install torchmetrics[visual]
(the latter option also installs Scienceplots and uses that as the default plotting style).
The basic interface is the same for any metric. Just call the new .plot
method:
metric = AnyMetricYouLike()
for _ in range(num_updates):
metric.update(preds[i], target[i])
fig, ax = metric.plot()
The plot
method by default does not require any arguments and will automatically call metric.compute
internally on
whatever metric states have been accumulated.
prefix
and postfix
arguments to ClasswiseWrapper
(#1866)compute_with_cache
to control caching behaviour after compute
method (#1754)ComplexScaleInvariantSignalNoiseRatio
for audio package (#1785)Running
wrapper for calculate running statistics (#1752)RelativeAverageSpectralError
and RootMeanSquaredErrorUsingSlidingWindow
to image package (#816)SpecificityAtSensitivity
Metric (#1432).plot()
method (#1328, #1481, #1480, #1490, #1581, #1585, #1593, #1600, #1605, #1610, #1609, #1621, #1624, #1623, #1638, #1631, #1650, #1639, #1660, #1682, #1786).plot()
method (#1434)classes
to output from MAP
metric (#1419)MinkowskiDistance
to regression package (#1362)pairwise_minkowski_distance
to pairwise package (#1362)PanopticQuality
(#929, #1527)PSNRB
metric (#1421)ClassificationTask
Enum and use in metrics (#1479)ignore_index
option to exact_match
metric (#1540)top_k
to RetrievalMAP
(#1501)torch.cumsum
operator (#1499).plot()
method (#1485)data_range
(#1606)ModifiedPanopticQuality
metric to detection package (#1627)PrecisionAtFixedRecall
metric to classification package (#1683)IntersectionOverUnion
GeneralizedIntersectionOverUnion
CompleteIntersectionOverUnion
DistanceIntersectionOverUnion
MultitaskWrapper
to wrapper package (#1762)RelativeSquaredError
metric to regression package (#1765)MemorizationInformedFrechetInceptionDistance
metric to image package (#1580)permutation_invariant_training
to allow using a 'permutation-wise'
metric function (#1794)update_count
and update_called
from private to public methods (#1370)EnumStr
raising ValueError
for invalid value (#1479)PrecisionRecallCurve
with large number of samples (#1493)__iter__
method from raising NotImplementedError
to TypeError
by setting to None
(#1538)FID
metric will now raise an error if too few samples are provided (#1655)torch.float64
(#1628)LPIPS
implementation to no more rely on third-party package (#1575)scipy
to torch
(#1708)PearsonCorrCoeff
to be more robust in certain cases (#1729)MeanAveragePrecision
to pycocotools
backend (#1832)MetricTracker
for MultioutputWrapper
and nested structures (#1608)PearsonCorrCoef
(#1649)jsonargparse
and LightningCLI
(#1651)MultiOutputWrapper
(#1675)MSSSIM
(#1674)max_det_threshold
in MAP detection (#1712)register_buffer
(#1728)MeanAveragePrecision
for iou_type="segm"
(#1763)prefix
and postfix
in nested MetricCollection
(#1773)ax
plotting logging in `MetricCollection (#1783)RougeScore
(#1789)CompositionalMetric
(#1761)SpectralDistortionIndex
metric (#1808)MatthewsCorrCoef
(#1812, #1863)PearsonCorrCoef
(#1819)average="macro"
in classification metrics (#1821)ignore_index = num_classes + 1
in Multiclass-jaccard (#1860)@alexkrz, @AndresAlgaba, @basveeling, @Bomme, @Borda, @Callidior, @clueless-skywatcher, @Dibz15, @EPronovost, @fkroeber, @ItamarChinn, @marcocaccin, @martinmeinke, @niberger, @Piyush-97, @quancs, @relativityhd, @shenoynikhil, @shhs29, @SkafteNicki, @soma2000-lang, @srishti-git1110, @stancld, @twsl, @ValerianRey, @venomouscyanide, @wbeardall
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Published by Borda over 1 year ago
R2Score
with the near constant target (#1576)dtype
conversion when the metric is submodule (#1583)top_k>1
and ignore_index!=None
in StatScores
based metrics (#1589)PearsonCorrCoef
when running in DDP mode but only on a single device (#1587)MAP
when big areas are calculated (#1607)@borda, @FarzanT, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/metrics/compare/v0.11.3...v0.11.4
Published by Borda over 1 year ago
byte
input (#1521)ignore_index
in MulticlassJaccardIndex
(#1386)@SkafteNicki, @vincentvaroquauxads
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/metrics/compare/v0.11.2...v0.11.3
Published by Borda over 1 year ago
_bincount
function (#1471)MetricTracker
wrapper (#1472)multilabel
in ExactMatch
(#1474)@7shoe, @borda, @SkafteNicki, @ValerianRey
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/metrics/compare/v0.11.1...v0.11.2
Published by Borda over 1 year ago
maximize
parameter at the initialization of MetricTracker
(#1428)SSIM
metric (#1454)nltk.punkt
in RougeScore
if a machine is not online (#1456)MultioutputWrapper
(#1460)dtype
checking in PrecisionRecallCurve
for target
tensor (#1457)@borda, @SkafteNicki, @stancld
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/metrics/compare/v0.11.0...v0.11.1