Eleven engagements.
Real results.
Websites, e-commerce platforms, AI agents, and marketing campaigns — with real metrics. Names are abstracted where confidentiality requires.
The problem. A high-growth B2B SaaS was generating 1,400 inbound leads per month from content, ads, and partner referrals. Five SDRs were spending 60% of their time qualifying leads, half of which were obviously unfit. High-intent leads were sitting in the queue for hours before getting touched.
What we built
A Slack-native triage agent that ingests every new lead in real time. It enriches the contact and account against Apollo and a custom intent feed, scores against an ICP rubric we built with their VP Sales, and either books a meeting directly via Calendly (for clearly qualified leads) or escalates to an SDR with a 3-bullet brief and recommended outreach.
The hard parts
The interesting engineering wasn't the LLM — it was the eval set. We built a labeled corpus of 800 historical leads with the actual sales outcome (won, lost, no-fit, etc.) and used it to tune the scoring rubric until the agent matched human reviewer judgment on 91% of cases. We rebuild and re-run those evals every time the prompt or model changes.
What's running now
The agent has triaged 4,200 leads over 90 days. SDRs went from 60% of their time on qualification to 16%, redeployed into pipeline acceleration. Demo conversion on agent-booked meetings is 2.4× higher than the prior baseline. They moved into a $5K/month retainer at the end of the included 30-day support window.
The problem. A growing DTC brand was getting buried in support tickets — order status, returns, sizing, product questions. Their support team was a mix of full-time and seasonal contractors, with high variance in reply quality and slow response times during peak.
What we built
A Zendesk-integrated copilot that does RAG over their help center, past tickets, and live order data from Shopify. For straightforward tickets (order status, return requests, sizing questions) it drafts and sends replies autonomously. For nuanced or sensitive tickets, it drafts a reply that an agent reviews, edits, and approves in one click.
The hard parts
The product catalog has 1,200 SKUs with frequent updates, and the help center had drift — old policies, contradictory articles, missing edge cases. Before the agent could ship, we ran a knowledge-base audit and surfaced 47 documents that needed updating. The customer ops team rewrote those during weeks 2–3. Without that, the agent would have hallucinated confidently. With it, it doesn't.
What's running now
68% of tickets are fully deflected without a human in the loop. Of the remaining 32%, the agent has drafted a reply that's accepted (with edits) on 84% of cases. Support headcount has stayed flat through 3× volume growth over two peak seasons. CSAT improved 6 points. They moved into a $6K/month retainer.
The problem. A mid-size firm was spending huge associate time on first-pass contract review — NDAs, vendor agreements, master services agreements. The work was repetitive, the markups were highly templated, but partners couldn't approve a fully-automated solution without an audit trail.
What we built
A document review agent trained on five years of the firm's own redlines. It reads incoming contracts, identifies clauses that deviate from firm-standard positions, and drafts a redline that an associate edits and finalizes. Every suggestion is logged with the source document and reasoning, so partners can audit any decision retroactively.
The hard parts
Legal sign-off was the entire problem. We spent three weeks of the engagement just building the audit trail and reviewing it with the firm's COO and managing partner. The agent's outputs are intentionally never sent to a client without human review — the win is associate speed, not headcount. That framing made the build defensible.
What's running now
Associates save 4.5 hours per matter on first-pass review. Throughput is up ~35% across the contract-heavy practice. Partner sign-off on the audit-trail workflow held up under a malpractice insurer's review. They moved into a $4K/month retainer focused on extending the agent to new contract types.
The problem. A growing marketplace had a moderation queue that couldn't keep up. New listings sat unreviewed for 18+ hours during peak, hurting conversion and seller experience. Fraud and policy violations were getting through because human moderators were rushing.
What we built
A multimodal agent that reviews each new listing's text and images against the marketplace's policies. Clean listings auto-approve in under five seconds. Borderline cases route to a human moderator with the agent's reasoning attached. Every moderator override feeds back into the eval set, and the agent re-tunes weekly.
The hard parts
Policy itself was a moving target. We worked with the Trust & Safety lead to formalize 23 separate policy categories into a structured spec — the work that policy documents alone couldn't capture. The agent's first-week performance was 78% agreement with moderators; by week four it was 94%. The improvement came almost entirely from the eval feedback loop, not from prompt-engineering.
What's running now
31K listings reviewed in the first overnight batch test. Now running on every new listing in real time. 94% agreement with moderator decisions on hold-out test sets. 19 hours/day of moderator time saved, redeployed into investigation, policy refinement, and seller education. They moved into an $8K/month retainer.
The problem. A mid-market SaaS had a Salesforce instance that had degraded over five years of growth — duplicate accounts, stale contact data, deals stuck in stages with no recent activity. Forecasting was unreliable. AEs were spending Monday mornings cleaning their own records.
What we built
A scheduled agent that runs nightly across the CRM. It detects and merges duplicate contacts and accounts using firmographic and identity matching. It re-enriches stale records via Clearbit and Apollo. It flags deals with no activity in 21+ days for AE follow-up. It writes a weekly summary to the RevOps lead.
The hard parts
Duplicate merging is the kind of thing that can do real damage. We built in a quarantine mode where every proposed merge was human-reviewed for the first three weeks before the agent gained autonomous merge authority — and even now, AEs can roll back any merge with one click. The trust came from never doing irreversible things without a clear undo.
What's running now
The initial cleanup found 38% of records had stale or missing critical fields and surfaced $1.2M in revisitable pipeline that had gone cold. AE Monday pipeline reviews shortened from 60 minutes to 25. They're now expanding the engagement into a Custom Platform that combines hygiene with intelligent next-best-action recommendations.
The problem. A fast-growing apparel brand had 3,200 SKUs on Shopify, most with supplier-written descriptions — inconsistent tone, missing SEO keywords, no brand voice. Seasonal launches kept adding SKUs faster than the two-person content team could handle. Product pages were their top organic entry point, but conversion was leaking because the copy wasn't doing its job.
What we built
A Shopify-integrated content agent trained on three years of the brand's best-performing copy. It ingests product attributes, images, and category context, then produces a description, SEO title, meta description, and six bullet points per SKU. A brand guidelines module runs a self-review pass before each output is staged for the team to review and publish.
The hard parts
Voice consistency at 3,200 SKUs is where most off-the-shelf tools fail. We spent two weeks building a brand voice eval set — 120 rated examples across tone, specificity, and prohibited language — and tuned the agent against it before running the full catalog. Without that foundation, the output would have been generic. With it, the content director accepted 94% of first drafts with minor edits.
What's running now
All 3,200 SKUs were rewritten in two weeks. New SKUs are now content-ready on day of launch, not two weeks after. Organic product page traffic improved 22% in the following quarter. The eight months of copywriter time freed up was redeployed into campaign and editorial work.
The problem. A B2B SaaS in a competitive vertical had a one-person content team that was the bottleneck for everything: blog, email newsletters, LinkedIn, and sales enablement. The content director was spending 70% of her time writing and 30% on strategy. The pipeline stalled whenever she was unavailable.
What we built
A content pipeline agent integrated into their Notion workflow. The content director submits a brief (topic, audience, goal, key points) and the agent researches the topic, pulls the competitive landscape, and produces a publish-ready blog post, email version, and three LinkedIn variants. A brand voice module — trained on two years of their highest-performing content — runs before delivery.
The hard parts
The brand voice was the product. We spent the first two weeks of the engagement doing a content audit: rating 200 articles against a rubric for specificity, tone, depth, and differentiation. That corpus became the eval set. The agent's first outputs scored 71% on brand alignment; after two weeks of tuning it was at 92%, which the content director called "better than most freelancers I've worked with."
What's running now
The team went from 4 posts per month to 16 with no new headcount. The content director now spends her time on editing, strategy, and new formats. Organic traffic grew 34% in four months. They've since extended the agent to produce sales one-pagers and case study first drafts from call transcripts.
The problem. Brumm Repair was booking appointments by phone only — no online presence beyond a bare listing. Customers couldn't find them easily, couldn't see services and pricing, and had no way to book outside business hours. Walk-ins were inconsistent and the owner's phone was the single point of failure.
What we built
A custom website built around the customer journey: services, pricing transparency, before/after gallery, and an integrated online booking system with real-time calendar availability. Designed to be unique and on-brand — not a template clone — with a CMS so the owner can update content without us.
The focus
Auto repair is a trust business. The site had to feel credible — honest about pricing, clear about what they specialise in, and easy to use on mobile (where 70%+ of local search happens). We ran Core Web Vitals optimisation from the first commit, not as an afterthought.
What's running now
The site exceeded the client's expectations and has been a meaningful driver of business growth. 65% of appointments now book online. The owner no longer manages a phone queue during peak hours. A local competitor saw the site and asked who built it.
The problem. Stefsotra started as a single-market fashion brand with ambitions to expand across Europe. Their existing setup was a basic Shopify theme that couldn't handle multi-currency checkout, country-specific pricing, or the volume their marketing was starting to generate.
What we built
A high-performance e-commerce platform built for scale from the ground up — multi-currency, multi-language, multi-market pricing rules, and a custom checkout that reduced drop-off at the payment step. We rearchitected the product catalog, built a streamlined admin for the operations team, and wired in their logistics and ERP systems.
The hard parts
Rapid growth is the hardest load to engineer for because you don't know when it hits. We built with horizontal scaling in mind and load-tested against 10× their then-current peak before launch. When their biggest traffic spike arrived, the platform handled it without a single downtime incident.
What's running now
Stefsotra scaled into a multi-million GMV, multi-country business. The platform has remained reliable through continued growth. The team attributed the platform's stability and performance as a core factor in being able to scale as fast as they did.
The problem. Sigma Beauty had a visually polished site that was quietly bleeding revenue. Mobile PageSpeed scores were in the 30s — a level at which Google deprioritises the site in search and users abandon before the page loads. Their product pages were image-heavy and unoptimised, hurting both SEO and conversion.
What we did
A focused performance audit and rebuild — image pipeline restructure, critical CSS extraction, lazy loading, JavaScript bundle analysis and reduction, CDN configuration, and LCP element optimisation. We didn't redesign anything; we made what they had fast without changing the visual experience.
The discipline
Page speed work is easy to do superficially and hard to do correctly. The real wins came from eliminating render-blocking resources, moving to next-gen image formats, and fixing the loading order on above-the-fold elements — not just enabling a cache plugin and calling it done.
What's running now
Mobile PageSpeed went from 38 to 91. The team called the improvement "incredible." Users now get a fast experience across devices, and the site ranks better on the product category queries that drive their organic traffic.
The problem. Mom Crew was a growing community brand with a loyal audience but a site that wasn't converting or loading fast enough. Customer drop-off was high on mobile, acquisition costs were climbing, and the brand's momentum wasn't showing up in the revenue numbers.
What we built
A full-stack engagement covering site rebuild, performance optimisation, and marketing infrastructure. We rebuilt the storefront for speed and conversion — faster load times, better mobile UX, improved checkout flow — then aligned their acquisition channels to the improved funnel so paid traffic could actually perform.
The principle
There's no point running ads to a slow site. We always work from the bottom of the funnel up: fix the conversion rate first, then scale acquisition. That's what keeps CAC from spiralling as you grow. Mom Crew's revenue growth came from fixing both sides simultaneously, not from spending more on ads.
What's running now
Revenue grew 140% since the partnership began. Customer drop-off is significantly lower. Acquisition costs came down even as volume increased — the opposite of what usually happens when you scale. The team calls it "a game-changer" for the business.
Yours is next.
If any of these patterns rhyme with your business, the Audit is the fastest way to find out whether the same playbook works for you.
Start the conversation →