FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, FEDML Nexus AI (https://fedml.ai) is your generative AI platform at scale.
APACHE-2.0 License
Bot releases are visible (Hide)
Backed by FEDML Nexus AI: Next-Gen Cloud Services for LLMs & Generative AI (https://nexus.fedml.ai)
FedML Documentation: https://doc.fedml.ai
FedML Homepage: https://fedml.ai/
FedML Blog: https://blog.fedml.ai/
FedML Medium: https://medium.com/@FedML
FedML Research: https://fedml.ai/research-papers/
Join the Community:
Slack: https://join.slack.com/t/fedml/shared_invite/zt-havwx1ee-a1xfOUrATNfc9DFqU~r34w
Discord: https://discord.gg/9xkW8ae6RV
FEDML® stands for Foundational Ecosystem Design for Machine Learning. FEDML Nexus AI is the next-gen cloud service for LLMs & Generative AI. It helps developers to launch complex model training, deployment, and federated learning anywhere on decentralized GPUs, multi-clouds, edge servers, and smartphones, easily, economically, and securely.
Highly integrated with FEDML open source library, FEDML Nexus AI provides holistic support of three interconnected AI infrastructure layers: user-friendly MLOps, a well-managed scheduler, and high-performance ML libraries for running any AI jobs across GPU Clouds.
A typical workflow is showing in figure above. When developer wants to run a pre-built job in Studio or Job Store, FEDML®Launch swiftly pairs AI jobs with the most economical GPU resources, auto-provisions, and effortlessly runs the job, eliminating complex environment setup and management. When running the job, FEDML®Launch orchestrates the compute plane in different cluster topologies and configuration so that any complex AI jobs are enabled, regardless model training, deployment, or even federated learning. FEDML®Open Source is unified and scalable machine learning library for running these AI jobs anywhere at any scale.
In the MLOps layer of FEDML Nexus AI
In the scheduler layer of FEDML Nexus AI
In the Compute layer of FEDML Nexus AI
FedML embraces and thrive through open-source. We welcome all kinds of contributions from the community. Kudos to all of our amazing contributors!
FedML has adopted Contributor Covenant.
Published by fedml-alex 12 months ago
FEDML® stands for Foundational Ecosystem Design for Machine Learning. FEDML Nexus AI is the next-gen cloud service for LLMs & Generative AI. It helps developers to launch complex model training, deployment, and federated learning anywhere on decentralized GPUs, multi-clouds, edge servers, and smartphones, easily, economically, and securely.
Highly integrated with FEDML open source library, FEDML Nexus AI provides holistic support of three interconnected AI infrastructure layers: user-friendly MLOps, a well-managed scheduler, and high-performance ML libraries for running any AI jobs across GPU Clouds.
FEDML Nexus AI: Next-Gen Cloud Services for LLMs & Generative AI
https://nexus.fedml.ai
FEDML Open Source: The unified and scalable ML library for large-scale distributed training, model serving, and federated learning
https://github.com/FedML-AI/FedML
FedML Documentation: https://doc.fedml.ai
FedML Homepage: https://fedml.ai/
FedML Blog: https://blog.fedml.ai/
FedML Medium: https://medium.com/@FedML
FedML Research: https://fedml.ai/research-papers/
Join the Community:
Slack: https://join.slack.com/t/fedml/shared_invite/zt-havwx1ee-a1xfOUrATNfc9DFqU~r34w
Discord: https://discord.gg/9xkW8ae6RV
A typical workflow is showing in figure above. When developer wants to run a pre-built job in Studio or Job Store, FEDML®Launch swiftly pairs AI jobs with the most economical GPU resources, auto-provisions, and effortlessly runs the job, eliminating complex environment setup and management. When running the job, FEDML®Launch orchestrates the compute plane in different cluster topologies and configuration so that any complex AI jobs are enabled, regardless model training, deployment, or even federated learning. FEDML®Open Source is unified and scalable machine learning library for running these AI jobs anywhere at any scale.
In the MLOps layer of FEDML Nexus AI
In the scheduler layer of FEDML Nexus AI
In the Compute layer of FEDML Nexus AI
Published by fedml-alex about 1 year ago
Published by FedML-AI-admin over 1 year ago
At FedML, our mission is to remove the friction and pain points of converting your ML & AI models from R&D into production-scale-distributed and federated training & serving via our no-code MLOps platform.
FedML is happy to announce our update 0.8.4. This release is filled with new capabilities, bug fixes, and enhancements. A key announcement is the launch of FedLLM for simplifying & reducing the costs associated with training & serving large language models. You can read more about it on our blog post.
[CoreEngine/MLOps] Launched FedLLM (Federated Large Language Model) for training and serving GitHub Blog
[CoreEngine] Deployed Helm Charts to our repository for packaging and ease of deploying on Kubernetes https://github.com/FedML-AI/FedML/blob/master/installation/install_on_k8s/fedml-edge-client-server/fedml-server-deployment-latest.tgz https://github.com/FedML-AI/FedML/blob/master/installation/install_on_k8s/fedml-edge-client-server/fedml-client-deployment-latest.tgz
[Documents] Refactored the devops and installation structures (devops for internal pipelines, installation for external users). https://github.com/FedML-AI/FedML/tree/master/installation
[DevOps] Deployed a new fedml fedml-light docker image and related documents. DockerHub GitHub doc
[DevOps] Built the light docker image to deploy to the k8s cluster, refined k8s related installation sections in the document. https://hub.docker.com/r/fedml/fedml-edge-client-server-light/tags
[CoreEngine] Added support for multiple simultaneous training jobs when using our open source MLOPs commands.
[CoreEngine] Improved training health monitoring and properly report failed status.
[CoreEngine] Added APIs for enabling, disabling and querying client agent status. The APIs are as follows.
curl -XPOST http://localhost:40800/fedml/api/v2/disableAgent -d’{}'
curl -XPOST http://localhost:40800/fedml/api/v2/enableAgent -d’{}'
curl -XPOST http://localhost:40800/fedml/api/v2/queryAgentStatus -d’{}'
[CoreEngine] Create distinct device ids when running multiple Docker containers to simulate multiple clients or silos on one machine. Now using the product id plus a random id as the device id
[CoreEngine] Fixed a device assignment issue in get_torch_device in the distributed training mode.
[Serving] Fixed the exceptions that occurred when recovering at startup after upgrading.
[CoreEngine] Fixed the device id issue when running in the docker on MacOS.
[App] Fixed the issue in the app fedprox + sage graph regression and graph clf.
[App] Fixed an issue with the heart disease app failing when running in MLOps.
[App] Fixed an issue with the heart disease app’s performance curve
[App/Android] Enhanced Android starting/stopping mechanism and fixed the following issues:
Fixed status displays after stopping the run.
When stopping a Run during a round that has not finished, the MNN process will remain in IDLE state (it was previously going OFFLINE).
When stopping after a round is done, the training will now stop
Python server TAG in the logs is not correct. Now you can easily find the server mentioned in logs.
[Serving] Tested the inference backend and checked the response after the model deployment is finished.
[CoreEngine/Serving] Set the GPU option based on the availability of CUDA when running the inference backend, optimize the mqtt connection checking.
[CoreEngine] Stored model caches to the user home directory when running the federated learning.
[CoreEngine] Added the device id to the monitor message when processing inference request
[CoreEngine] Reported the runner exception and ignored exceptions when missing the bootstrap section in the fedml_config.yaml.
Published by FedML-AI-admin over 1 year ago
Published by FedML-AI-admin over 1 year ago
Published by fedml-alex over 1 year ago
Train, deploy, monitor, and improve machine learning models anywhere (edge/cloud) powered by collaboration on combined data, models, and computing resources
For more detailed instructions, please refer to https://doc.fedml.ai/
Published by chaoyanghe over 2 years ago