Today, we’re announcing the general availability of Amazon SageMaker HyperPod task governance, a new innovation to easily and centrally manage and maximize GPU and Tranium utilization across generative AI model development tasks, such as training, fine-tuning, and inference. Customers tell us that they’re rapidly increasing investment in generative AI projects, but they face challenges in efficiently allocating limited compute resources. The lack of dynamic, centralized governance for resource allocation leads to inefficiencies, with some projects underutilizing resources while others stall. This situation burdens administrators with constant replanning, causes delays for data scientists and developers, and results in untimely delivery of AI innovations and cost overruns due to inefficient use of resources. With SageMaker HyperPod task governance, you can accelerate time…
Tag: Channy Yun (윤석찬)
New Amazon Q Developer agent capabilities include generating documentation, code reviews, and unit tests
Last year at AWS re:Invent, we previewed Amazon Q Developer, a generative AI–powered assistant for designing, building, testing, deploying, and maintaining software across integrated development environments (IDEs) such as Visual Studio, Visual Studio Code, JetBrains IDEs, Eclipse (preview), JupyterLab, Amazon EMR Studio, or AWS Glue Studio. You can also use Amazon Q Developer in the AWS Management Console, AWS Console Mobile Application, Amazon CodeCatalyst, AWS Support, AWS website, or through Slack and Microsoft Teams with AWS Chatbot. Due to the rapid pace of innovation, we announced the general availability of Amazon Q Developer in April and added more capabilities, such as supporting AWS Command Line Interface (AWS CLI), Amazon SageMaker Studio, AWS CloudShell, as well as inline chat for seamless…
Build faster, more cost-efficient, highly accurate models with Amazon Bedrock Model Distillation (preview)
Today, we’re announcing the availability of Amazon Bedrock Model Distillation in preview that automates the process of creating a distilled model for your specific use case by generating responses from a large foundation model (FM) called a teacher model and fine-tunes a smaller FM called a student model with the generated responses. It uses data synthesis techniques to improve response from a teacher model. Amazon Bedrock then hosts the final distilled model for inference giving you a faster and more cost-efficient model with accuracy close to the teacher model, for your use case. Customers are excited to use the most powerful and accurate FMs on Amazon Bedrock for their generative AI applications. But for some use cases, the latency associated…
New Amazon EC2 P5en instances with NVIDIA H200 Tensor Core GPUs and EFAv3 networking
Today, we’re announcing the general availability of Amazon Elastic Compute Cloud (Amazon EC2) P5en instances, powered by NVIDIA H200 Tensor Core GPUs and custom 4th generation Intel Xeon Scalable processors with an all-core turbo frequency of 3.2 GHz (max core turbo frequency of 3.8 GHz) available only on AWS. These processors offer 50 percent higher memory bandwidth and up to four times throughput between CPU and GPU with PCIe Gen5, which help boost performance for machine learning (ML) training and inference workloads. P5en, with up to 3200 Gbps of third generation of Elastic Fabric Adapter (EFAv3) using Nitro v5, shows up to 35% improvement in latency compared to P5 that uses the previous generation of EFA and Nitro. This helps improve…