| --- |
| tags: |
| - sentence-transformers |
| - sentence-similarity |
| - information-retrieval |
| - semantic-search |
| widget: |
| - source_sentence: >- |
| Descrivi dettagliatamente il processo chimico e fisico che avviene durante |
| la preparazione di un impasto per crostata |
| sentences: |
| - >- |
| ## La Magia Chimica e Fisica nell'Impasto della Crostata: Un Viaggio Dagli |
| Ingredienti Secchi al Trionfo del Forno |
| |
|
|
| La preparazione di una crostata, apparentemente un gesto semplice e |
| familiare, cela in realtà un affascinante balletto di reazioni chimiche e |
| trasformazioni fisiche... |
| - >- |
| ## L'Arte Effimera: Creare un Dolce Paesaggio Invernale |
| |
|
|
| Immergiamoci nel cuore pulsante della pasticceria festiva, dove l'arte |
| culinaria si fonde con la creatività artistica... |
| - >- |
| Le piattaforme di comunicazione digitale, con la loro ubiquità crescente, si |
| configurano come un'arma a doppio taglio nel panorama sociale |
| contemporaneo... |
| pipeline_tag: sentence-similarity |
| library_name: sentence-transformers |
| language: |
| - it |
| license: apache-2.0 |
| --- |
| |
| <p align="center"> |
| <img src="benchmark.png" style="max-width: 1024px; width: 100%; height: auto;"/> |
| </p> |
| <h1 style="font-size: 48px; text-align: center;">Ita-Search 🇮🇹</h1> |
|
|
| # Fine-tuned Qwen3-Embedding for Italian Semantic Retrieval |
|
|
| This model is a specialized fine-tuned version of [Qwen/Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) optimized for Italian semantic retrieval tasks, with particular emphasis on Italian query understanding and document ranking. |
|
|
| ## Model Description |
|
|
| - **Model Type**: Dense embedding model for semantic retrieval |
| - **Base Model**: [Qwen/Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) |
| - **Output Dimensionality**: 1,024-dimensional dense vectors |
| - **Maximum Sequence Length**: 32,768 tokens |
| - **Primary Language**: Italian |
| - **Similarity Function**: Cosine similarity |
|
|
| ## Capabilities |
|
|
| ### Italian Semantic Retrieval |
| The model demonstrates strong performance in matching Italian queries to Italian documents, particularly effective in technical and academic domains within the Italian language context. |
|
|
| ### Domain Coverage |
| Trained on diverse Italian knowledge domains including: |
| - **Medical & Health Sciences**: Diagnostic imaging, clinical procedures, medical terminology |
| - **STEM Fields**: Physics, computer science, geology, engineering |
| - **Professional Domains**: Finance, law, agriculture, software development |
| - **Educational Content**: Historical studies, culinary arts, general knowledge |
|
|
| ### Query Understanding |
| Enhanced comprehension of: |
| - Conversational and informal Italian query patterns |
| - Technical terminology in Italian across domains |
| - Italian semantic concepts and nuances |
| - Complex multi-faceted questions in Italian |
|
|
| ## Training Data |
|
|
| The model was fine-tuned on a curated corpus of Italian semantic data, featuring high-quality triplets designed to capture semantic nuances across multiple domains. The dataset emphasizes: |
|
|
| - **Hard negative mining**: Strategic inclusion of semantically related but incorrect documents |
| - **Italian language focus**: Comprehensive representation of Italian language patterns |
| - **Domain diversity**: Comprehensive coverage of academic, professional, and conversational contexts in Italian |
| - **Quality curation**: Manual review and automated filtering for coherence and relevance |
|
|
| ## Usage |
|
|
| ### Basic Retrieval |
| ```python |
| from sentence_transformers import SentenceTransformer |
| |
| model = SentenceTransformer("DeepMount00/Ita-Search") |
| |
| # Italian query-document matching |
| query = "Come si distingue una faglia trascorrente da una normale?" |
| documents = [ |
| "Le faglie trascorrenti sono caratterizzate da movimento orizzontale...", |
| "Le faglie normali si verificano a causa di stress estensionale...", |
| "Le strategie di gestione del portafoglio di investimenti..." |
| ] |
| |
| query_embedding = model.encode(query, prompt="Represent this search query for finding relevant passages: ") |
| doc_embeddings = model.encode(documents, prompt="Represent this passage for retrieval: ") |
| similarities = model.similarity(query_embedding, doc_embeddings) |
| ``` |
|
|
| ### Prompt Templates |
| The model is optimized for specific prompt templates: |
| - **Queries**: `"Represent this search query for finding relevant passages: "` |
| - **Documents**: `"Represent this passage for retrieval: "` |
|
|
| ## Applications |
|
|
| - **Italian information retrieval systems** |
| - **Academic and technical document search in Italian** |
| - **Italian question-answering platforms** |
| - **Educational content recommendation for Italian speakers** |
| - **Professional knowledge base systems in Italian** |
|
|
| ## Limitations |
|
|
| - **Language coverage**: Specifically optimized for Italian language |
| - **Domain specificity**: Performance may vary on highly specialized domains not represented in training |
|
|
|
|
| ## Acknowledgments |
|
|
| This work builds upon the Qwen3-Embedding architecture and advances in contrastive learning for dense retrieval. We acknowledge the contributions of the Qwen team and the sentence-transformers community. |
|
|
| --- |
|
|
| **License**: Inherits licensing terms from the base Qwen/Qwen3-Embedding-0.6B model. |