Generative AI for Developers
Generative AI has introduced a new wave of developer tools, frameworks and applications. The vastly expanding ecosystem helps train massive multimodal models, fine-tune for use cases, quantize and deploy from data centers to the smallest embedded devices. Developers building generative AI applications need an accelerated computing platform with full-stack optimizations, from chip and systems software to acceleration libraries and application development frameworks. With NVIDIA-hosted model APIs and prebuilt inference microservices for deploying models anywhere, it’s easy to get started.
NVIDIA Full-Stack Generative AI Software Ecosystem
NVIDIA offers a full-stack accelerated computing platform purpose-built for generative AI workloads. The platform is both deep and wide, offering a combination of hardware, software, and services—all built by NVIDIA and its broad ecosystem of partners—so developers can deliver cutting-edge solutions.
Building applications for specific use cases and domains requires user-friendly APIs, efficient fine-tuning techniques, and, in the context of LLM applications, integration with robust third-party apps, vector databases, and guardrailing systems. NVIDIA offers hosted API endpoints and prebuilt inference microservices for deploying the latest AI models anywhere, enabling developers to quickly build custom generative AI applications.
Our software stack powers partners like OpenAI, Cohere, Google VertexAI, and AzureML, allowing developers to use generative AI API endpoints. For domain-specific customization or augmenting applications with databases, in addition to NVIDIA NeMo™, NVIDIA’s ecosystem includes Hugging Face, LangChain, LlamaIndex, and Milvus.
To deploy safe, trustworthy models, NeMo provides simple tools for evaluating trained and fine-tuned models, including GPT and its variants. Developers can also add programmable guardrails with NeMo Guardrails to control the output of LLM applications, such as implementing controls to avoid discussing politics and tailoring responses based on user requests.
MLOps and LLMOps tools further assist in evaluating LLM models. NVIDIA NeMo can be integrated with LLMOps tools such as Weights & Biases and MLFlow. Developers can also use NVIDIA Triton™ Inference Server to analyze model performance and standardize AI model deployment.
Accelerating specific generative AI computations on compute infrastructure requires libraries and compilers that are specifically designed to address the needs of LLMs. Some of the most popular libraries include XLA, Megatron-LM, CUTLASS, CUDA®, NVIDIA® TensorRT™-LLM, RAFT, and cuDNN.
Building large-scale models often requires upwards of thousands of GPUs, and inferencing is done on multi-node, multi-GPU configurations to address memory-limited bandwidth issues. This requires software that can carefully orchestrate the different generative AI workloads on accelerated infrastructure. Some management and orchestration libraries include Kubernetes, Slurm, Nephele, and NVIDIA Base Command™.
NVIDIA-accelerated computing platforms provide the infrastructure to power these applications in the most cost-optimized way, whether they’re run in a data center, the cloud, or on local desktops and laptops. Powerful platforms and technologies include NVIDIA DGX™ platform, NVIDIA HGX™ systems, NVIDIA RTX™ systems, and NVIDIA Jetson™.
Build With Generative AI
Developers can choose to engage with the NVIDIA AI platform at any layer of the stack, from infrastructure, software, and models to applications, either directly through NVIDIA products or through a vast ecosystem of offerings.
Start With State-of-the-Art Foundation Models
Try the latest models, including Llama 3, Stable Diffusion, NVIDIA’s Nemotron-3 8B family, and more.
Experience AI Foundation Models
Deploy AI Models Across Platforms
Quickly deploy AI models using easy-to-use inference microservices.
Deploy With NVIDIA NIM
Connect Generative AI Models to Knowledge Bases
Use retrieval-augmented generation (RAG) to connect LLMs to the latest information.
Try a RAG Example on GitHub
Train and Customize Generative AI for Every Industry
Build custom generative AI models for industries, including gaming, healthcare, automotive, industrial, and more.
Customize With NVIDIA NeMo
Best Practices for LLM Application Development
Tune in to hands-on sessions with NVIDIA experts to learn about state-of-the-art models, customization and optimization techniques, and how to run your own LLM apps.
Benefits
End-to-End Accelerated Stack
Accelerates every layer of the stack, from infrastructure to the app layer, with offerings from DGX Cloud to NeMo.
High Performance
Delivers real-time performance with GPU optimizations, including quantization-aware training, layer and tensor fusion, and kernel tuning.
Ecosystem Integrations
Tightly integrates with leading generative AI frameworks. For example, NVIDIA NeMo's connectors enable the use of NVIDIA AI Foundation models and TensorRT-LLM optimizations within the LangChain framework for RAG agents.
NVIDIA NIM Agent Blueprints Learning Library
Multimodal PDF Data Extraction for Enterprise RAG
Use NeMo Retriever NIM™ microservices to unlock highly accurate insights from massive volumes of enterprise data.
Generative Virtual Screening for Drug Discovery
Search and optimize a library of small molecules to identify chemical structures that bind to a target protein.
Digital Humans for Customer Service
Bring applications to life with an AI-powered digital avatar to transform customer service experiences.
Access Exclusive NVIDIA Resources
The NVIDIA Developer Program gives you free access to the latest AI models for development with NVIDIA NIM™, along with access to training, documentation, how-to guides, expert forums, support from peers and domain experts, and information on the right hardware to tackle the biggest challenges.
Get Generative AI Training and Certification
Elevate your technical skills in generative AI and LLMs with NVIDIA Training’s comprehensive learning paths, covering fundamental to advanced topics, featuring hands-on training, and delivered by NVIDIA experts. Showcase your skills and advance your career by getting certified by NVIDIA.
Connect With NVIDIA Experts
Have questions as you’re getting started? Explore our NVIDIA Developer Forum for AI to get your questions answered or explore insights from other developers.
Build Your Custom Generative AI With NVIDIA Partners
For generative AI startups, NVIDIA Inception provides access to the latest developer resources, preferred pricing on NVIDIA software and hardware, and exposure to the venture capital community. The program is free and available to tech startups of all stages.
Latest News
Explore what’s new and learn about our latest breakthroughs.
Shining Brighter Together: Google’s Gemma Optimized to Run on NVIDIA GPUs
Google's state-of-the-art, new, lightweight, 2-billion and 7-billion-parameter open language model, Gemma, is optimized with NVIDIA TensorRT-LLM and can run anywhere, reducing costs and speeding up innovative work for domain-specific use cases.
NVIDIA Reveals Gaming, Creating, Generative AI, Robotics Innovations at CES
At CES, NVIDIA released the TensorRT-LLM library for Windows, announced NVIDIA Avatar Cloud Engine (ACE) microservices with generative AI models for digital avatars, and unveiled a partnership with iStock by Getty Images, a generative AI service powered by NVIDIA Edify.
Amgen to Build Generative AI Models for Novel Human Data Insights and Drug Discovery
Amgen, an early adopter of NVIDIA BioNeMo™, uses it to accelerate drug discovery and development with generative AI models. They plan to integrate the NVIDIA DGX SuperPOD™ to train state-of-the-art models in days rather than months.
Get Started With Generative AI
Scale Your Business Applications With Generative AI
Experience, prototype, and deploy AI with production-ready APIs that run anywhere.
Enterprise-Ready Generative AI With NVIDIA AI Enterprise
The NVIDIA AI Enterprise subscription includes production-grade software, accelerating enterprises to the leading edge of AI with easy-to-deploy microservices, enterprise support, security, and API stability.