Top discussion (from HN)

roh26it 530d ago

What are the trade-offs you've made to achieve this?

wrs 529d ago

https://console.ncompass.tech/models has no models on it, just a "Get in Touch" button.

handfuloflight 529d ago

What is Groq (rate limited) missing that you aren't?

HyprMusic 529d ago

Since you're calling out your support for underserved models, can I request you support some SOTA embeddings models? Support for embeddings is poor from other providers with only a handful of outdated models and poor latency.

ttul 529d ago

Unrelated: During the dot-com boom, there was a company called nCompass Labs that developed one of the first content management systems (https://en.wikipedia.org/wiki/NCompass_Labs_Inc). Microsoft bought them in 2001. Their product was, "a plug-in for hosting ActiveX controls in Netscape Navigator named ScriptActive." ActiveX itself was a novelty, using C++ templates to define reusable and _downloadable_ web components.All of this crap was happily replaced with JavaScript frameworks in later years. Yes, back in the early-2000s, your browser might literally download executable code just to render a custom button.

JackC 529d ago

Random idea -- I think it would be cool for hosts that advertise efficiency to have a dashboard that shows total tokens per watt-hour (or whatever usage:energy metric) graphed over time for each model they host, taking into account as much of their infra as possible.This would:- let you boast about your cool proprietary optimizations- naturally get better over time just from applying public algorithmic improvements- show up hosts that refuse to do the same- give you a good incentive to keep on top of your own efficiency and competitiveness over time- be a good response to users who vaguely know that AI takes "a lot" of energy -- it's actually gotten a lot better, but how much better?Happy to chat if it would help to have a neutral academic voice involved.

darknoon 529d ago

One vote for image inputs here. I would love a fine-tuned qwen-2-vl-72b on demand, but most of the solutions are "talk to us" level expensive. I'm assuming you beat the price or convenience of a replicate / modal solution?

Oras 529d ago

1. Why do you have a limited number of models publicly? Do you have to configure each one manually?2. I don't see the 50% cheaper option. According to your pricing page, 16B+ models will cost $0.90, which is the same price for Together.ai and fireworks.ai

michael0x11 528d ago

Interesting approach to model serving - the 2-4x lower TTFT compared to vLLM is impressive, but I'd be curious to see detailed benchmarks across different batch sizes and model architectures to validate those performance claims. The no rate limits policy is bold but could get expensive fast if you're not doing some clever GPU utilization under the hood.

swyx 528d ago

this sounds like black magic, kudos to you. i'd love to chat, dm me on https://twitter.com/swyx if you'd find it useful to chat with someone like me.

View all comments on Hacker News →

This page summarizes a public Hacker News story. Full discussion and updates live on news.ycombinator.com.
Toolliyo Assistant
Ask about tutorials, ebooks, training, pricing, mentor services, and support. I use public site content only—not admin or internal tools.

care@toolliyo.com

Need callback? Share your details