 
Neetu Pathak, Co-Founder and CEO of Skymel, leads the company in revolutionizing AI inference with its innovative NeuroSplit™ technology. Alongside CTO Sushant Tripathy, she drives Skymel’s mission to enhance AI application performance while reducing computational costs.
NeuroSplit™ is an adaptive inferencing technology that dynamically distributes AI workloads between end-user devices and cloud servers. This approach leverages idle computing resources on user devices, cutting cloud infrastructure costs by up to 60%, accelerating inference speeds, ensuring data privacy, and enabling seamless scalability.
By optimizing local compute power, NeuroSplit™ allows AI applications to run efficiently even on older GPUs, significantly lowering costs while improving user experience.
What inspired you to co-found Skymel, and what key challenges in AI infrastructure were you aiming to solve with NeuroSplit?
The inspiration for Skymel came from the convergence of our complementary experiences. During his time at Google my co-founder, Sushant Tripathy, was deploying speech-based AI models across billions of Android devices. He discovered there was an enormous amount of idle compute power available on end-user devices, but most companies couldn’t effectively utilize it due to the complex engineering challenges of accessing these resources without compromising user experience.
Meanwhile, my experience working with enterprises and startups at Redis gave me deep insight into how critical latency was becoming for businesses. As AI applications became more prevalent, it was clear that we needed to move processing closer to where data was being created, rather than constantly shuttling data back and forth to data centers.
That’s when Sushant and I realized the future wasn’t about choosing between local or cloud processing—it was about creating an intelligent technology that could seamlessly adapt between local, cloud, or hybrid processing based on each specific inference request. This insight led us to found Skymel and develop NeuroSplit, moving beyond the traditional infrastructure limitations that were holding back AI innovation.
Can you explain how NeuroSplit dynamically optimizes compute resources while maintaining user privacy and performance?
One of the major pitfalls in local AI inferencing has been its static compute requirements— traditionally, running an AI model demands the same computational resources regardless of the device’s conditions or user behavior. This one-size-fits-all approach ignores the reality that devices have different hardware capabilities, from various chips (GPU, NPU, CPU, XPU) to varying network bandwidth, and users have different behaviors in terms of application usage and charging patterns.
NeuroSplit continuously monitors various device telemetrics— from hardware capabilities to current resource utilization, battery status, and network conditions. We also factor in user behavior patterns, like how many other applications are running and typical device usage patterns. This comprehensive monitoring allows NeuroSplit to dynamically determine how much inference compute can be safely run on the end-user device while optimizing for developers’ key performance indicators
When data privacy is paramount, NeuroSplit ensures raw data never leaves the device, processing sensitive information locally while still maintaining optimal performance. Our ability to smartly split, trim, or decouple AI models allows us to fit 50-100 AI stub models in the memory space of just one quantized model on an end-user device. In practical terms, this means users can run significantly more AI-powered applications simultaneously, processing sensitive data locally, compared to traditional static computation approaches.
What are the main benefits of NeuroSplit’s adaptive inferencing for AI companies, particularly those working with older GPU technology?
NeuroSplit delivers three transformative benefits for AI companies. First, it dramatically reduces infrastructure costs through two mechanisms: companies can utilize cheaper, older GPUs effectively, and our unique ability to fit both full and stub models on cloud GPUs enables significantly higher GPU utilization rates. For example, an application that typically requires multiple NVIDIA A100s at $2.74 per hour can now run on either a single A100 or multiple V100s at just 83 cents per hour.
Second, we substantially improve performance by processing initial raw data directly on user devices. This means the data that eventually travels to the cloud is much smaller in size, significantly reducing network latency while maintaining accuracy. This hybrid approach gives companies the best of both worlds— the speed of local processing with the power of cloud computing.
Third, by handling sensitive initial data processing on the end-user device, we help companies maintain strong user privacy protections without sacrificing performance. This is increasingly crucial as privacy regulations become stricter and users more privacy-conscious.
How does Skymel’s solution reduce costs for AI inferencing without compromising on model complexity or accuracy?
First, by splitting individual AI models, we distribute computation between the user devices and the cloud. The first part runs on the end-user’s device, handling 5% to 100% of the total computation depending on available device resources. Only the remaining computation needs to be processed on cloud GPUs.
This splitting means cloud GPUs handle a reduced computational load— if a model originally required a full A100 GPU, after splitting, that same workload might only need 30-40% of the GPU’s capacity. This allows companies to use more cost-effective GPU instances like the V100.
Second, NeuroSplit optimizes GPU utilization in the cloud. By efficiently arranging both full models and stub models (the remaining parts of split models) on the same cloud GPU, we achieve significantly higher utilization rates compared to traditional approaches. This means more models can run simultaneously on the same cloud GPU, further reducing per-inference costs.
What distinguishes Skymel’s hybrid (local + cloud) approach from other AI infrastructure solutions on the market?
The AI landscape is at a fascinating inflection point. While Apple, Samsung, and Qualcomm are demonstrating the power of hybrid AI through their ecosystem features, these remain walled gardens. But AI shouldn’t be limited by which end-user device someone happens to use.
NeuroSplit is fundamentally device-agnostic, cloud-agnostic, and neural network-agnostic. This means developers can finally deliver consistent AI experiences regardless of whether their users are on an iPhone, Android device, or laptop— or whether they’re using AWS, Azure, or Google Cloud.
Think about what this means for developers. They can build their AI application once and know it will adapt intelligently across any device, any cloud, and any neural network architecture. No more building different versions for different platforms or compromising features based on device capabilities.
We’re bringing enterprise-grade hybrid AI capabilities out of walled gardens and making them universally accessible. As AI becomes central to every application, this kind of flexibility and consistency isn’t just an advantage— it’s essential for innovation.
How does the Orchestrator Agent complement NeuroSplit, and what role does it play in transforming AI deployment strategies?
The Orchestrator Agent (OA) and NeuroSplit work together to create a self-optimizing AI deployment system:
1. Eevelopers set the boundaries:
- Constraints: allowed models, versions, cloud providers, zones, compliance rules
- Goals: target latency, cost limits, performance requirements, privacy needs
2. OA works within these constraints to achieve the goals:
- Decides which models/APIs to use for each request
- Adapts deployment strategies based on real-world performance
- Makes trade-offs to optimize for specified goals
- Can be reconfigured instantly as needs change
3. NeuroSplit executes OA’s decisions:
- Uses real-time device telemetry to optimize execution
- Splits processing between device and cloud when beneficial
- Ensures each inference runs optimally given current conditions
It’s like having an AI system that autonomously optimizes itself within your defined rules and targets, rather than requiring manual optimization for every scenario.
In your opinion, how will the Orchestrator Agent reshape the way AI is deployed across industries?
It solves three critical challenges that have been holding back AI adoption and innovation.
First, it allows companies to keep pace with the latest AI advancements effortlessly. With the Orchestrator Agent, you can instantly leverage the newest models and techniques without reworking your infrastructure. This is a major competitive advantage in a world where AI innovation is moving at breakneck speeds.
Second, it enables dynamic, per-request optimization of AI model selection. The Orchestrator Agent can intelligently mix and match models from the huge ecosystem of options to deliver the best possible results for each user interaction. For example, a customer service AI could use a specialized model for technical questions and a different one for billing inquiries, delivering better results for each type of interaction.
Third, it maximizes performance while minimizing costs. The Agent automatically balances between running AI on the user’s device or in the cloud based on what makes the most sense at that moment. When privacy is important, it processes data locally. When extra computing power is needed, it leverages the cloud. All of this happens behind the scenes, creating a smooth experience for users while optimizing resources for businesses.
But what truly sets the Orchestrator Agent apart is how it enables businesses to create next-generation hyper-personalized experiences for their users. Take an e-learning platform— with our technology, they can build a system that automatically adapts its teaching approach based on each student’s comprehension level. When a user searches for “machine learning,” the platform doesn’t just show generic results – it can instantly assess their current understanding and customize explanations using concepts they already know.
Ultimately, the Orchestrator Agent represents the future of AI deployment— a shift from static, monolithic AI infrastructure to dynamic, adaptive, self-optimizing AI orchestration. It’s not just about making AI deployment easier— it’s about making entirely new classes of AI applications possible.
What kind of feedback have you received so far from companies participating in the private beta of the Orchestrator Agent?
The feedback from our private beta participants has been great! Companies are thrilled to discover they can finally break free from infrastructure lock-in, whether to proprietary models or hosting services. The ability to future-proof any deployment decision has been a game-changer, eliminating those dreaded months of rework when switching approaches.
Our NeuroSplit performance results have been nothing short of remarkable— we can’t wait to share the data publicly soon. What’s particularly exciting is how the very concept of adaptive AI deployment has captured imaginations. The fact that AI is deploying itself sounds futuristic and not something they expected now, so just from the technological advancement people get excited about the possibilities and new markets it might create in the future.
With the rapid advancements in generative AI, what do you see as the next major hurdles for AI infrastructure, and how does Skymel plan to address them?
We’re heading toward a future that most haven’t fully grasped yet: there won’t be a single dominant AI model, but billions of them. Even if we create the most powerful general AI model imaginable, we’ll still need personalized versions for every person on Earth, each adapted to unique contexts, preferences, and needs. That’s at least 8 billion models, based on the world’s population.
This marks a revolutionary shift from today’s one-size-fits-all approach. The future demands intelligent infrastructure that can handle billions of models. At Skymel, we’re not just solving today’s deployment challenges – our technology roadmap is already building the foundation for what’s coming next.
How do you envision AI infrastructure evolving over the next five years, and what role do you see Skymel playing in this evolution?
The AI infrastructure landscape is about to undergo a fundamental shift. While today’s focus is on scaling generic large language models in the cloud, the next five years will see AI becoming deeply personalized and context-aware. This isn’t just about fine-tuning— it’s about AI that adapts to specific users, devices, and situations in real time.
This shift creates two major infrastructure challenges. First, the traditional approach of running everything in centralized data centers becomes unsustainable both technically and economically. Second, the increasing complexity of AI applications means we need infrastructure that can dynamically optimize across multiple models, devices, and compute locations.
At Skymel, we’re building infrastructure that specifically addresses these challenges. Our technology enables AI to run wherever it makes the most sense— whether that’s on the device where data is being generated, in the cloud where more compute is available, or intelligently split between the two. More importantly, it adapts these decisions in real time based on changing conditions and requirements.
Looking ahead, successful AI applications won’t be defined by the size of their models or the amount of compute they can access. They’ll be defined by their ability to deliver personalized, responsive experiences while efficiently managing resources. Our goal is to make this level of intelligent optimization accessible to every AI application, regardless of scale or complexity.
Thank you for the great interview, readers who wish to learn more should visit Skymel.
The post Neetu Pathak, Co-Founder and CEO of Skymel – Interview Series appeared first on Unite.AI.

 
			 
			