[FEEDBACK] Daily Papers
Note that this is not a post about adding new papers, it's about feedback on the Daily Papers community update feature.
How to submit a paper to the Daily Papers, like @akhaliq (AK)?
- Submitting is available to paper authors
- Only recent papers (less than 7d) can be featured on the Daily
Then drop the arxiv id in the form at https://huggingface.co/papers/submit
- Add medias to the paper (images, videos) when relevant
- You can start the discussion to engage with the community
Please check out the documentation
We are excited to share our recent work on MLLM architecture design titled "Ovis: Structural Embedding Alignment for Multimodal Large Language Model".
Paper: https://arxiv.org/abs/2405.20797
Github: https://github.com/AIDC-AI/Ovis
Model: https://huggingface.co/AIDC-AI/Ovis-Clip-Llama3-8B
Data: https://huggingface.co/datasets/AIDC-AI/Ovis-dataset
we are excited to share our work titled "Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models" : https://arxiv.org/abs/2406.12644
We propose DASH (Distributed Accelerated SHampoo), a faster and more accurate version of Distributed Shampoo.
To make it faster, we stack the blocks extracted from the preconditioners to obtain a 3D tensor, which are inverted efficiently using batch-matmuls via iterative procedures.
To make it more accurate, we introduce an existing iterative method from Numerical Linear Algebra called Newton-DB, which is more accurate than the existing Coupled Newton implemented in Distributed Shampoo.
These iterative procedures usually require the largest eigen-value of the input matrix to be upper bounded by 1, which should be obtained by scaling the input matrix. In theory, one should divide by the true largest eigen-value of the matrix, which is expensive to compute in Distributed Shampoo. Before our work, the simplest scaling was Frobenius norm, which is usually much larger than the largest eigen-value.
Since we work with all blocks in parallel in a stacked form, our implementation allows running Power-Iteration to estimate the largest eigen-value for all blocks in one shot. Why is this better?
When we scale the input matrix by Frobenius norm, the spectrum is shifted towards zero. We show that iterative procedures require more steps to converge for small eigen-values compared to larger ones. Therefore, scaling by an approximation of the largest eigen-value is desired and in our DASH implementation this is cheaper and therefore leads to faster training and more accurate models.
If you want to find out more, check out:
Paper: https://huggingface.co/papers/2602.02016
Code: https://github.com/IST-DASLab/DASH
Hi, @akhaliq , @Kramp , @AdinaY
https://arxiv.org/abs/2603.05438 also meets {"error":"Arxiv paper not found"}
Can you take a look?
Thank you!
Hi, @akhaliq , @Kramp , @AdinaY
https://arxiv.org/abs/2603.05438 also meets {"error":"Arxiv paper not found"}
Can you take a look?
Thank you!
Hi @kdwon - The paper is now on the Daily Papers page: https://huggingface.co/papers/2603.05438
Feel free to claim it with your HF account, and start communicating with the community.
I'm also getting the same error as above for our new paper (Arxiv paper not found)
https://arxiv.org/abs/2603.10055
Would you be able to help with this? Thanks so much!
I'm also getting the same error as above for our new paper (Arxiv paper not found)
https://arxiv.org/abs/2603.10055
Would you be able to help with this? Thanks so much!
Hi @hanseungwook - I've submitted the paper to Daily Papers here: https://huggingface.co/papers/2603.10055. Feel free to claim it with your HF account.


