More details about the mighty phi-3 family of SLMs. Such a key development as GenAI Dev becomes more about always optimizing and orchestrating with LLMs routing to LLMs and SLMs continuously to balance complexity, latency, cost and more! https://lnkd.in/gGQYqNKt
Ahmed Adel’s Post
More Relevant Posts
-
Last week, with the announcements of GPT-4o and Google I/O, huge bets are on multi-modality agents. Today, we are excited to introduce Multi Agent Flow, powered by LangChain's LangGraph 🕸 Multi agent consists of a team of agents that collaborate together to complete a task delegated by a supervisor. Result is significantly better for long-running task. Here's why: ⚒ Dedicated prompt and tools for each agent 🔄 Reflective loop for auto-correction 🌐 Separate LLMs for different agent Multi Agent Flow supports: - Function Calling LLMs (Claude, Mistral, Gemini, OpenAI) - Multi Modality (image, speech & files coming soon) - API - Prompt input variables Available now in v1.8.0 Repo: https://lnkd.in/dsph3WMU
To view or add a comment, sign in
-
🚀 See how Handshake cut their LLM GPU costs by 50% with Anyscale. Discover how they: 💰 Reduced LLM GPU costs by 50% or more. 📈 Seamlessly scaled large language models (LLMs) without compromising performance. ⏱ Enhanced operational efficiency, enabling faster development cycles. Check out the full story here: https://lnkd.in/gSiKJDaY
How Handshake Saves 50% on LLM GPU Costs with Anyscale
anyscale.com
To view or add a comment, sign in
-
Chains vs group-chat has been a big differentiator between Flowise and AutoGen and other frameworks. Building logic to progress through a chain and repeat steps if needed adds complexity that is easily overcome by agents collaborating in a group-chat. The downside is that you shift this control from traditional programmatic steps to relying on prompt engineering to make group decisions. For some workflows, this is fine, but others require a more rigid progression of steps (e.g. CI/CD pipelines). A hybrid approach is the best of both worlds.
Last week, with the announcements of GPT-4o and Google I/O, huge bets are on multi-modality agents. Today, we are excited to introduce Multi Agent Flow, powered by LangChain's LangGraph 🕸 Multi agent consists of a team of agents that collaborate together to complete a task delegated by a supervisor. Result is significantly better for long-running task. Here's why: ⚒ Dedicated prompt and tools for each agent 🔄 Reflective loop for auto-correction 🌐 Separate LLMs for different agent Multi Agent Flow supports: - Function Calling LLMs (Claude, Mistral, Gemini, OpenAI) - Multi Modality (image, speech & files coming soon) - API - Prompt input variables Available now in v1.8.0 Repo: https://lnkd.in/dsph3WMU
To view or add a comment, sign in
-
Efficiency of LLM infrastructures is one of the most important topics for wide scale LLM adoptions and there are many different subtopics such as optimization of inference, optimization of GPU allocations, scaling up architectures etc Here is a very interesting read from Character AI how they optimize their inference infrastructure for their production loads of 20000 qps https://lnkd.in/gzPPzUYZ
Optimizing AI Inference at Character.AI
research.character.ai
To view or add a comment, sign in
-
Watch GPT and Google Gemini go head-to-head in a game of trivia. GPT 5 is reportedly coming this summer, and just last week, Google began making Gemini 1.5 available to all developers. Between those two—plus Claude, Mistral, Llama, Perplexity, and more—it's hard to know which model to use. Sure, you could test them, but human evaluations are expensive. That's why I'm interested in how LLMs themselves can be used for automated evals. In this demo, the questions and assessments are all AI-generated. It's LLMs grading answers given by LLMs to questions written by LLMs. Lots of caveats apply (read the FAQ!), but I had fun building it.
GPT vs. Gemini | Two LLMs, one winner
gptversusgemini.com
To view or add a comment, sign in
-
#Day22 of #100DaysOfGenAI Today, I delved into the concept of LLMOps, which focuses on the operational management of large language models, and explored Vext, a platform designed to simplify the LLM pipeline. Vext provides a suite of tools that make deploying, maintaining, and scaling LLMs more efficient. By streamlining the complexities of LLM workflows, Vext ensures smoother integration of these models into real-world applications. Understanding platforms like Vext is becoming increasingly important as the role of LLMs expands across various industries. This knowledge enhances my ability to work effectively with advanced AI systems, ensuring they operate at peak efficiency in production environments. Vext: https://vextapp.com #LLMOps #MachineLearning #AI #DeepLearning #LLM #Vext #100DaysOfCode
Vext - The LLMOps OS: LLM Pipeline Simplified
vextapp.com
To view or add a comment, sign in
-
MD(Anesthesia,Urgent Care)UrologyResearch UPENN AI,HealthcareAI SAI/Robotics/NeuroAI/Multiomics/Energy/ReverseAging Open for AIStartups seeking strategic investments. HI-HSI>AI-AGI-ASI/HI-HSI<AI-AGI-ASI(?)
<<OneGen, a novel solution that unifies the retrieval and generation processes into a single forward pass within an LLM. By integrating autoregressive retrieval tokens into the model, OneGen enables the system to handle both tasks simultaneously without the need for multiple forward passes or separate retrieval and generation models. This innovative approach significantly reduces computational overhead and inference time, enhancing the efficiency of LLMs.>>
OneGen: An AI Framework that Enables a Single LLM to Handle both Retrieval and Generation Simultaneously
https://www.marktechpost.com
To view or add a comment, sign in
-
#Google #DeepMind presents a new hybrid architecture which enables tokens in the LLM to cross-attend to node embeddings from a GNN-based neural algorithmic reasoner (NAR). The resulting model, called TransNAR, demonstrates improvements in OOD reasoning across algorithmic tasks. Quotes from the paper on why NARs could be useful: "NARs are capable of holding perfect generalization even on 6× larger inputs than ones seen in the training set, for highly complex algorithmic tasks with long rollouts". The key here is the generalization that you are getting from NARs when combined with Transformers. https://lnkd.in/gUvpSWTt Google
To view or add a comment, sign in
-
Best Resources to Learn & Understand Evaluating LLMs via #TowardsAI → https://bit.ly/3WAPbs2
Best Resources to Learn & Understand Evaluating LLMs
towardsai.net
To view or add a comment, sign in
-
"Through a series of machine learning innovations, we’ve increased 1.5 Pro’s context window capacity far beyond the original 32,000 tokens for Gemini 1.0. We can now run up to 1 million tokens in production. This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we’ve also successfully tested up to 10 million tokens.' https://lnkd.in/ddrFBMnj
Our next-generation model: Gemini 1.5
blog.google
To view or add a comment, sign in