azurehpc-health-checks

Health checks for Azure N- and H-series VMs.

MIT License

Stars
23

Bot releases are hidden (Show)

azurehpc-health-checks - AZ Health Checks v0.4.2 Latest Release

Published by rafsalas19 6 months ago

What's Changed

  • Docker image now uses MCR
  • HC44 now supported
  • Bug Fixes
azurehpc-health-checks - AZ Health Checks v0.4.2

Published by rafsalas19 6 months ago

What's Changed

  • Docker image now available through MCR: "mcr.microsoft.com/aznhc/aznhc-nv"
  • Feature add: Append additional conf file to conf file being used via command line argument
azurehpc-health-checks - AZ Health Checks v0.4.1

Published by rafsalas19 6 months ago

What's Changed

  • Introduction of Docker NHC
  • Increased visibility into test results
  • Increased logging
  • Bug fixes
azurehpc-health-checks - AZ Health Checks v0.2.9

Published by rafsalas19 7 months ago

  • Adding additional logging and Kusto functionality
  • Bug fixes
azurehpc-health-checks - AZ Health Checks v0.2.9

Published by rafsalas19 8 months ago

  • Adding additional logging and Kusto functionality
  • Bug fixes
azurehpc-health-checks - AZ Health Checks v0.4.0

Published by rafsalas19 8 months ago

This is a prerelease of the docker version of AzNHC.

The functionality remains the same. The major differences are the following:

  • No installation needed
  • Docker is now a prerequisite
  • The only set up is pulling the docker image

The run script and checks behave the same way.

azurehpc-health-checks - AZ Health Checks v0.2.9

Published by rafsalas19 8 months ago

  • Adding additional logging and Kusto functionality
azurehpc-health-checks - AZ Health Checks v0.2.9

Published by rafsalas19 8 months ago

  • Additional logging for NHC results
  • Kusto modifications for better logging
azurehpc-health-checks - AZ Health Checks v0.2.9

Published by rafsalas19 8 months ago

  • Additional logging for NHC results
azurehpc-health-checks - AZ Health Checks v0.2.8

Published by rafsalas19 9 months ago

What's Changed

  • Adding AMD GPU SKU support
  • Add feature to extend tests by adding a secondary conf file
  • Bug Fixes
azurehpc-health-checks - AZ Health Checks v0.2.8

Published by rafsalas19 9 months ago

What's Changed

  • Adding AMD GPU SKU support
  • Bug Fixes
azurehpc-health-checks - AZ Health Checks v0.2.7

Published by rafsalas19 9 months ago

What's Changed

  • NCv3, NCv4, NCv5, NDv2 support
  • Hbv3 smaller sizes support
  • NvBandwidth tool from Nvidia added to measure Nvlink and GPU BW
  • Refresh install script.
azurehpc-health-checks - AZ Health Checks v0.2.6

Published by rafsalas19 12 months ago

What's Changed

  • Bug fixes
  • Refactoring of IB write tests
    • NDv5 SKU IB tests no longer communicate between devices. The HCA device loop back to themselves to prevent IB traffic from leaving the node
    • Renaming tests for clarity
  • Documentation update
azurehpc-health-checks - AZ Health Checks v0.2.6

Published by rafsalas19 12 months ago

What's Changed

  • Bug fixes
  • Refactoring of IB write tests
    • NDv5 SKU IB tests no longer communicate between devices. The HCA device loop back to themselves to prevent IB traffic from leaving the node
    • Renaming tests for clarity
  • Documentation update
azurehpc-health-checks - AZ Health Checks v0.2.6

Published by rafsalas19 12 months ago

What's Changed

  • Bug fixes
  • Refactoring of IB write tests
    • NDv5 SKU IB tests no longer communicate between devices. The HCA device loop back to themselves to prevent IB traffic from leaving the node
    • Renaming tests for clarity
  • Documentation update
azurehpc-health-checks - AZ Health Checks v0.2.6

Published by rafsalas19 12 months ago

What's Changed

  • Bug fixes
  • Refactoring of IB write tests
    • NDv5 SKU IB tests no longer communicate between devices. The HCA device loop back to themselves to prevent IB traffic from leaving the node
    • Renaming tests for clarity
  • Documentation update
azurehpc-health-checks - v0.2.6

Published by rafsalas19 about 1 year ago

What's Changed

  • Bug fixes
  • Refactoring of IB write tests
    • NDv5 SKU IB tests no longer communicate between devices. The HCA device loop back to themselves to prevent IB traffic from leaving the node
    • Renaming tests for clarity
  • Documentation update

Full Changelog: https://github.com/Azure/azurehpc-health-checks/compare/v0.2.5...v0.2.6

azurehpc-health-checks - Az Health Checks v0.2.5

Published by rafsalas19 about 1 year ago

Whats Changed:

  • Distributed NHC functionality
    • Adds ability to launch NHC in a distributed fashion
    • Adds ability to launch NHC with Slurm
  • Changes to support Cycle cloud usage
  • Accelerated network checks addition

New Contributors

Full Changelog: https://github.com/Azure/azurehpc-health-checks/compare/v0.2.4...v.0.2.5

azurehpc-health-checks - Az Health Checks v0.2.4

Published by rafsalas19 about 1 year ago

What's Changed

  • Bug fixes
azurehpc-health-checks - Az Health Checks v0.2.3

Published by rafsalas19 over 1 year ago

  • Bug fixes
  • Run script to choose sku and run proper test
  • Addition of other HBv4/Hx SKUs
Badges
Extracted from project README
Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status
Related Projects