AI & ML interests

None defined yet.

Recent Activity

fffiloni 
posted an update 1 day ago
view post
Post
1544
✨ PASD Magnify is back on Hugging Face Spaces

fffiloni/PASD

PASD isn’t recent, but still delivers strong results — worth restoring rather than replacing.

Getting it to run again wasn’t a simple dependency issue.
It relied on parts of diffusers that no longer exist, while moving to Gradio 6 forced a much newer HF stack — and I couldn’t modify the original source directly.

Recreating the old environment wasn’t practical.
So I patched the downloaded code at runtime before import and made it compatible with today’s stack.

That ended up being the only approach that held without forking or freezing everything to outdated versions.

If you’ve used it before (or are curious), feel free to give it another try.
fffiloni 
posted an update 10 days ago
view post
Post
2826
✅ Back up and running!

My TIGER app is now fully working again, with fixes and full compatibility with Gradio 6 🚀

It lets you:
- 🎙️ Separate multiple speakers from an audio file
- 🎬 Extract each speaker directly from a video
- 🎧 Split audio into dialog, music, and sound effects (DnR)
- 🎥 Apply DnR separation directly on videos

All powered by lightweight TIGER models for fast and efficient speech separation.

Try it here 👉 fffiloni/TIGER-audio-extraction
fffiloni 
posted an update 11 days ago
view post
Post
2212
AniDoc is back 🎉

I’ve fixed the Space and brought it back to life:
- ✅ Working again after being broken for a while
- ✅ Updated to Gradio 6
- ✅ Compatible with ZeroGPU
- ✅ Output videos now preserve original resolution and FPS

I also added advanced controls so you can experiment more (tracking, seed, motion, sketch).

Try it here: fffiloni/AniDoc
fffiloni 
posted an update 24 days ago
view post
Post
4111
I brought DALL·E mini back to life 🤖🎨

You can try it here:
fffiloni/dalle-mini-reboot

And I also built a batch version using Hugging Face Jobs (up to 50 images per prompt):
fffiloni/dalle-mini-via-jobs

The goal was to stay close to the original JAX/Flax pipeline, while integrating it with modern tooling (Gradio + Jobs).

It ended up being a fun way to revisit this model — still weird, still fun 😄
  • 3 replies
·
Nymbo 
posted an update 27 days ago
view post
Post
6511
We should really have a release date range slider on the /models page. Tired of "trending/most downloaded" being the best way to sort and still seeing models from 2023 on the first page just because they're embedded in enterprise pipelines and get downloaded repeatedly. "Recently Created/Recently Updated" don't solve the discovery problem considering the amount of noise to sift through.

Slight caveat: Trending actually does have some recency bias, but it's not strong/precise enough.
  • 3 replies
·
fffiloni 
posted an update 29 days ago
view post
Post
488
A clearer demo for TADA (now multilingual) 🔊🌍

I improved the public demo for TADA — a generative framework for speech modeling via text–acoustic dual alignment.

TADA models speech as a joint sequence of text tokens and acoustic tokens, using a transformer backbone to keep text and audio synchronized during generation.

The original demo already exposed these mechanisms, but the workflow made the pipeline hard to understand.

This updated demo makes the process clearer:

• load the model
• prepare a reference voice (optionally with transcript or Whisper auto-transcription)
• generate speech conditioned on that reference

It also adds multilingual support.

Presets are included for a few languages, but the model supports more:

English, French, Spanish, German, Arabic, Mandarin Chinese, Italian, Japanese, Polish, Portuguese

Feel free to try different voices, accents, or languages and see how the alignment behaves.

👉 fffiloni/tada-dual-alignment-tts-demo

Paper
TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment (2602.23068)
Nymbo 
posted an update 3 months ago
view post
Post
2783
Genuine recommendation: You should really use this AutoHotKey macro. Save the file as macros.ahk and run it. Before sending a prompt to your coding agent, press Ctrl + Alt + 1 and paste your prompt to any regular chatbot. Then send the output to the agent. This is the actual, boring, real way to "10x your prompting". Use the other number keys to avoid repeating yourself over and over again. I use this macro prolly 100-200 times per day. AutoHotKey isn't as new or hype as a lot of other workflows, but there's a reason it's still widely used after 17 years. Don't overcomplicate it.

; Requires AutoHotkey v1.1+

; All macros are `Ctrl + Alt + <variable>`

^!1::
    Send, Please help me more clearly articulate what I mean with this message (write the message in a code block):
return

^!2::
    Send, Please make the following changes:
return

^!3::
    Send, It seems you got cut off by the maximum response limit. Please continue by picking up where you left off.
return


In my experience the past few months, Ctrl + Alt + 1 works best with Instruct models (non-thinking). Reasoning causes some models to ramble and miss the point. I've just been using GPT-5.x for this.
Nymbo 
posted an update 4 months ago
view post
Post
2812
🚨 New tool for the Nymbo/Tools MCP server: The new Agent_Skills tool provides full support for Agent Skills (Claude Skills but open-source).

How it works: The tool exposes the standard discover/info/resources/validate actions. Skills live in /Skills under the same File_System root, and any bundled scripts run through Shell_Command, no new infrastructure required.

Agent_Skills(action="discover")  # List all available skills
Agent_Skills(action="info", skill_name="music-downloader")  # Full SKILL.md
Agent_Skills(action="resources", skill_name="music-downloader")  # Scripts, refs, assets


I've included a music-downloader skill as a working demo, it wraps yt-dlp for YouTube/SoundCloud audio extraction.

Caveat: On HF Spaces, Shell_Command works for most tasks, but some operations (like YouTube downloads) are restricted due to the container environment. For full functionality, run the server locally on your machine.

Try it out ~ https://www.nymbo.net/nymbot
Nymbo 
posted an update 5 months ago
view post
Post
5289
🚀 I've just shipped a major update to the Nymbo/Tools MCP server: the Agent_Terminal, a single "master tool" that cuts token usage by over 90%!

Anthropic found 98.7% context savings using code execution with MCP, Cloudflare published similar findings. This is my open-source implementation of the same idea.

# The Problem

Traditional MCP exposes every tool definition directly to the model. With 12 tools, that's thousands of tokens consumed *before the conversation even starts*. Each tool call also passes intermediate results through the context window — a 10,000-row spreadsheet? That's all going into context just to sum a column.

# The Solution: One Tool to Rule Them All

Agent_Terminal wraps all 12 tools (Web_Search, Web_Fetch, File_System, Generate_Image, Generate_Speech, Generate_Video, Deep_Research, Memory_Manager, Obsidian_Vault, Shell_Command, Code_Interpreter) into a single Python code execution gateway.

Instead of the model making individual tool calls, it writes Python code that orchestrates the tools directly:

# Search for Bitcoin price
result = Web_Search("current price of bitcoin", max_results=3)
print(result)


Don't know what tools are available? The agent can discover them at runtime:

print(search_tools('image'))  # Find tools by keyword
print(usage('Generate_Image'))  # Get full docs for a specific tool


The individual direct tool calls are all still there, but they can be disabled if using the Agent_Terminal. Try it now - https://www.nymbo.net/nymbot
  • 1 reply
·