SentenceTransformer based on reasonir/ReasonIR-8B

This is a sentence-transformers model finetuned from reasonir/ReasonIR-8B. It maps sentences & paragraphs to a 4096-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: reasonir/ReasonIR-8B
  • Maximum Sequence Length: 131072 tokens
  • Output Dimensionality: 4096 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 131072, 'do_lower_case': False, 'architecture': 'ReasonIRModel'})
  (1): Pooling({'word_embedding_dimension': 4096, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': False})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("shahafvl/ReasonIR-8B-scientific-challenge-v1.1")
# Run inference
sentences = [
    'Further investigations regarding stochastic fair queuing and new AQM algorithms are seen as desirable. In any case, network infrastructure updates will take time, particularly if the interest of the involved stakeholders is not aligned (as is often the case for network operators when dealing with over-the-top real-time traffic). It is, therefore, imperative that RTCWEB congestion control provides adequate improvement in the absence of any of the aforementioned schemes.',
    'Ensuring efficient and equitable management of network resources in the face of evolving traffic patterns is critical. Beyond the nuances of individual congestion control algorithms, a larger challenge exists: designing end-to-end traffic management mechanisms that can adapt to heterogeneous network conditions, diverse application requirements, and varying degrees of stakeholder alignment. Finding scalable solutions that promote both fairness and performance under these broader real-world constraints remains an open problem.',
    'Testing and evaluating active queue management algorithms often requires costly and complex emulation environments to faithfully reproduce real-world network conditions. The lack of standardized testing frameworks can hinder the comparability and reproducibility of research in this area.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 4096]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.6407, -0.0421],
#         [ 0.6407,  1.0000, -0.0686],
#         [-0.0421, -0.0686,  1.0000]])

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.9999

Training Details

Training Dataset

Unnamed Dataset

  • Size: 106,250 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 45 tokens
    • mean: 92.1 tokens
    • max: 185 tokens
    • min: 42 tokens
    • mean: 65.17 tokens
    • max: 104 tokens
    • min: 25 tokens
    • mean: 40.35 tokens
    • max: 61 tokens
  • Samples:
    anchor positive negative
    The only evidence that factoring is hard consists of our failure so far to find a fast and practical factoring algorithm. ( The polynomial-time factoring algorithms that are based on the use of quantum computers are not considered to be practical and not addressed in this survey.) Interestingly, and to an outsider maybe surprisingly, an entire industry is based on this belief that factoring is hard: the security, LENSTRA i.e., the unbreakability, of one of the most popular public key cryptosystems relies on the supposed difficulty of factoring (cf. The perceived computational difficulty of certain mathematical problems forms the foundation for the security of many cryptographic protocols. This reliance exists despite the absence of formal proofs confirming that these problems are inherently hard to solve efficiently. The broader challenge is to understand whether basing security on presumed hardness is justified, and what the risks are if future advances in algorithms or computing architectures undermine these assumptions. Developing methods to accurately predict when a factoring algorithm will outperform existing approaches is a significant challenge in the optimization of cryptographic systems. Such prediction mechanisms would help practitioners remain ahead of unforeseen breakthroughs in algorithmic factoring techniques.
    The only evidence that factoring is hard consists of our failure so far to find a fast and practical factoring algorithm. ( The polynomial-time factoring algorithms that are based on the use of quantum computers are not considered to be practical and not addressed in this survey.) Interestingly, and to an outsider maybe surprisingly, an entire industry is based on this belief that factoring is hard: the security, LENSTRA i.e., the unbreakability, of one of the most popular public key cryptosystems relies on the supposed difficulty of factoring (cf. The perceived computational difficulty of certain mathematical problems forms the foundation for the security of many cryptographic protocols. This reliance exists despite the absence of formal proofs confirming that these problems are inherently hard to solve efficiently. The broader challenge is to understand whether basing security on presumed hardness is justified, and what the risks are if future advances in algorithms or computing architectures undermine these assumptions. A key challenge in algorithm design relates to selecting the most efficient approach for a given hardware architecture. The relative performance of factoring algorithms can vary dramatically between CPUs, GPUs, and specialized hardware, making portability and benchmarking difficult.
    The only evidence that factoring is hard consists of our failure so far to find a fast and practical factoring algorithm. ( The polynomial-time factoring algorithms that are based on the use of quantum computers are not considered to be practical and not addressed in this survey.) Interestingly, and to an outsider maybe surprisingly, an entire industry is based on this belief that factoring is hard: the security, LENSTRA i.e., the unbreakability, of one of the most popular public key cryptosystems relies on the supposed difficulty of factoring (cf. The perceived computational difficulty of certain mathematical problems forms the foundation for the security of many cryptographic protocols. This reliance exists despite the absence of formal proofs confirming that these problems are inherently hard to solve efficiently. The broader challenge is to understand whether basing security on presumed hardness is justified, and what the risks are if future advances in algorithms or computing architectures undermine these assumptions. Ensuring the robustness of cryptographic systems to implementation flaws, such as side-channel attacks, remains an ongoing challenge distinct from the underlying mathematical difficulty of factoring. Adversaries may exploit vulnerabilities in physical or software implementations regardless of the computational hardness of the underlying problem.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 18,750 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 44 tokens
    • mean: 88.85 tokens
    • max: 173 tokens
    • min: 45 tokens
    • mean: 64.61 tokens
    • max: 99 tokens
    • min: 24 tokens
    • mean: 40.94 tokens
    • max: 73 tokens
  • Samples:
    anchor positive negative
    On the other hand, original data, such as biomedical data, cannot be easily collected due to privacy and security concerns. Therefore, researchers interested in solving the original task are unable to collect a sufficient amount of data to train DNNs. Conventional methods address this problem by applying transfer learning. In many fields, the procurement of large-scale, high-quality datasets is hindered by issues such as privacy, proprietary restrictions, and logistical constraints. As a result, the effectiveness of deep learning models is restricted by limited access to relevant training data. Broadly, the challenge lies in developing machine learning methods that can perform well despite data scarcity caused by various collection barriers, beyond just privacy and security concerns. Standard transfer learning approaches can result in negative transfer if the source and target domains are insufficiently related, leading to degraded performance rather than improvements in the original task.
    On the other hand, original data, such as biomedical data, cannot be easily collected due to privacy and security concerns. Therefore, researchers interested in solving the original task are unable to collect a sufficient amount of data to train DNNs. Conventional methods address this problem by applying transfer learning. In many fields, the procurement of large-scale, high-quality datasets is hindered by issues such as privacy, proprietary restrictions, and logistical constraints. As a result, the effectiveness of deep learning models is restricted by limited access to relevant training data. Broadly, the challenge lies in developing machine learning methods that can perform well despite data scarcity caused by various collection barriers, beyond just privacy and security concerns. When dealing with biomedical data, ensuring that trained models do not inadvertently memorize sensitive information is a major concern, requiring the development of privacy-preserving techniques such as differential privacy or federated learning.
    On the other hand, original data, such as biomedical data, cannot be easily collected due to privacy and security concerns. Therefore, researchers interested in solving the original task are unable to collect a sufficient amount of data to train DNNs. Conventional methods address this problem by applying transfer learning. In many fields, the procurement of large-scale, high-quality datasets is hindered by issues such as privacy, proprietary restrictions, and logistical constraints. As a result, the effectiveness of deep learning models is restricted by limited access to relevant training data. Broadly, the challenge lies in developing machine learning methods that can perform well despite data scarcity caused by various collection barriers, beyond just privacy and security concerns. Another challenge arises when the available biomedical datasets are heavily imbalanced, with certain classes being overrepresented and others underrepresented, complicating the training and evaluation of reliable deep neural networks.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • bf16: True
  • load_best_model_at_end: True
  • push_to_hub: True
  • hub_model_id: shahafvl/ReasonIR-8B-scientific-challenge-v1.1
  • hub_private_repo: True
  • auto_find_batch_size: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: shahafvl/ReasonIR-8B-scientific-challenge-v1.1
  • hub_strategy: every_save
  • hub_private_repo: True
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: True
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss hierarchical_eval_cosine_accuracy
0.0376 500 0.1874 - -
0.0753 1000 0.0135 0.0322 0.9974
0.1129 1500 0.0128 - -
0.1506 2000 0.0109 0.0143 0.9997
0.1882 2500 0.0082 - -
0.2259 3000 0.0058 0.0142 0.9990
0.2635 3500 0.0047 - -
0.3012 4000 0.0029 0.0128 0.9998
0.3388 4500 0.0033 - -
0.3764 5000 0.0052 0.0124 0.9999
0.4141 5500 0.0041 - -
0.4517 6000 0.0032 0.0111 0.9999
0.4894 6500 0.0048 - -
0.5270 7000 0.001 0.0103 0.9999
0.5647 7500 0.0034 - -
0.6023 8000 0.0024 0.0101 0.9999
0.6400 8500 0.0017 - -
0.6776 9000 0.0009 0.0101 0.9999
0.7153 9500 0.0017 - -
0.7529 10000 0.0028 0.0101 0.9999
0.7905 10500 0.0012 - -
0.8282 11000 0.0023 0.0100 0.9999
0.8658 11500 0.0016 - -
0.9035 12000 0.0033 0.0100 0.9999
0.9411 12500 0.0015 - -
0.9788 13000 0.0005 0.0100 0.9999

Framework Versions

  • Python: 3.11.9
  • Sentence Transformers: 5.0.0
  • Transformers: 4.54.0
  • PyTorch: 2.7.1+cu126
  • Accelerate: 1.9.0
  • Datasets: 4.0.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
9
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shahafvl/ReasonIR-8B-scientific-challenge-v1.1

Finetuned
(2)
this model

Papers for shahafvl/ReasonIR-8B-scientific-challenge-v1.1

Evaluation results