SentenceTransformer based on reasonir/ReasonIR-8B

This is a sentence-transformers model finetuned from reasonir/ReasonIR-8B. It maps sentences & paragraphs to a 4096-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: reasonir/ReasonIR-8B
  • Maximum Sequence Length: 131072 tokens
  • Output Dimensionality: 4096 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 131072, 'do_lower_case': False, 'architecture': 'ReasonIRModel'})
  (1): Pooling({'word_embedding_dimension': 4096, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': False})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("shahafvl/reasonir-8b-scientific-parent-prompt")
# Run inference
sentences = [
    'Running document analytics pipelines can be highly time-consuming, particularly as the underlying corpora expand at a rapid pace. In addition, these workloads typically demand substantial storage capacity and main memory. A widely adopted strategy to alleviate the storage pressure is to apply data compression.',
    'Data-intensive analytics workloads are often both computationally expensive and demanding in terms of storage and memory resources, with costs escalating as datasets grow. One prevalent method for alleviating storage and memory pressure is to store data in compressed form.',
    'Endmember variability in spectral unmixing encompasses both extrinsic factors, such as sensor geometry and illumination, and intrinsic factors related to the physical and chemical properties of the materials. While the former can often be approximated with analytical radiative transfer models, the latter is generally too complex to describe explicitly, motivating the adoption of statistical distributions, mixture models, or stochastic processes to model endmember spectra in high-dimensional spaces.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 4096]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8138, 0.0302],
#         [0.8138, 1.0000, 0.0370],
#         [0.0302, 0.0370, 1.0000]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 1.0
cosine_accuracy@3 1.0
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 1.0
cosine_precision@3 1.0
cosine_precision@5 0.987
cosine_precision@10 0.5949
cosine_recall@1 0.1667
cosine_recall@3 0.5001
cosine_recall@5 0.8225
cosine_recall@10 0.9916
cosine_ndcg@10 0.9924
cosine_mrr@10 1.0
cosine_map@100 0.9886

Triplet

  • Datasets: orig_vs_parent, orig_vs_grandparent, sib_vs_parent and sib_vs_grandparent
  • Evaluated with TripletEvaluator
Metric orig_vs_parent orig_vs_grandparent sib_vs_parent sib_vs_grandparent
cosine_accuracy 0.99 0.01 0.9896 0.0104

Training Details

Training Dataset

Unnamed Dataset

  • Size: 161,986 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 41 tokens
    • mean: 77.92 tokens
    • max: 120 tokens
    • min: 43 tokens
    • mean: 70.07 tokens
    • max: 108 tokens
  • Samples:
    anchor positive
    LLZO is stable against Li metal and against a high voltage cathode. However, oxides generally require high sintering temperatures to remove grain boundaries to achieve the reported conductivity values. They also tend to be brittle, which makes it harder (relative to the sulfides) to maintain solid-solid interfacial contact and also to process. Oxide-based solid electrolytes often show excellent electrochemical stability with both lithium metal anodes and high-voltage cathodes, but exploiting their full ionic conductivity generally requires aggressive high-temperature sintering to suppress grain boundary resistance. The resulting brittle ceramic bodies can be challenging to process and to integrate mechanically, particularly when maintaining intimate solid–solid interfacial contact with electrodes is critical.
    LLZO is stable against Li metal and against a high voltage cathode. However, oxides generally require high sintering temperatures to remove grain boundaries to achieve the reported conductivity values. They also tend to be brittle, which makes it harder (relative to the sulfides) to maintain solid-solid interfacial contact and also to process. Oxide-based solid electrolytes often show excellent electrochemical stability with both lithium metal anodes and high-voltage cathodes, but exploiting their full ionic conductivity generally requires aggressive high-temperature sintering to suppress grain boundary resistance. The resulting brittle ceramic bodies can be challenging to process and to integrate mechanically, particularly when maintaining intimate solid–solid interfacial contact with electrodes is critical.
    LLZO is stable against Li metal and against a high voltage cathode. However, oxides generally require high sintering temperatures to remove grain boundaries to achieve the reported conductivity values. They also tend to be brittle, which makes it harder (relative to the sulfides) to maintain solid-solid interfacial contact and also to process. Oxide-based solid electrolytes often show excellent electrochemical stability with both lithium metal anodes and high-voltage cathodes, but exploiting their full ionic conductivity generally requires aggressive high-temperature sintering to suppress grain boundary resistance. The resulting brittle ceramic bodies can be challenging to process and to integrate mechanically, particularly when maintaining intimate solid–solid interfacial contact with electrodes is critical.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • learning_rate: 1e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.05
  • bf16: True
  • disable_tqdm: True
  • load_best_model_at_end: True
  • push_to_hub: True
  • hub_model_id: shahafvl/reasonir-8b-scientific-parent-prompt
  • hub_private_repo: False
  • auto_find_batch_size: True
  • prompts: {'anchor': 'Retrieve the broader scientific generalization or context for the given specific text.\nQuery: '}

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.05
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: True
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: shahafvl/reasonir-8b-scientific-parent-prompt
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: True
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: {'anchor': 'Retrieve the broader scientific generalization or context for the given specific text.\nQuery: '}
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss ir_parent_grandparent_combined_cosine_ndcg@10 orig_vs_parent_cosine_accuracy orig_vs_grandparent_cosine_accuracy sib_vs_parent_cosine_accuracy sib_vs_grandparent_cosine_accuracy
0.0049 100 0.0293 - - - - -
0.0099 200 0.003 - - - - -
0.0148 300 0.0036 - - - - -
0.0198 400 0.0006 - - - - -
0.0247 500 0.0004 - - - - -
0.0296 600 0.0004 - - - - -
0.0346 700 0.0022 - - - - -
0.0395 800 0.0002 - - - - -
0.0444 900 0.0001 - - - - -
0.0494 1000 0.0003 - - - - -
0.0543 1100 0.0028 - - - - -
0.0593 1200 0.002 - - - - -
0.0642 1300 0.0027 - - - - -
0.0691 1400 0.0003 - - - - -
0.0741 1500 0.0021 - - - - -
0.0790 1600 0.0001 - - - - -
0.0840 1700 0.0001 - - - - -
0.0889 1800 0.0018 - - - - -
0.0938 1900 0.0001 - - - - -
0.0988 2000 0.0057 - - - - -
0.1037 2100 0.0028 - - - - -
0.1086 2200 0.0027 - - - - -
0.1136 2300 0.0 - - - - -
0.1185 2400 0.0 - - - - -
0.1235 2500 0.0001 - - - - -
0.1284 2600 0.0045 - - - - -
0.1333 2700 0.0018 - - - - -
0.1383 2800 0.0037 - - - - -
0.1432 2900 0.0088 - - - - -
0.1482 3000 0.0069 - - - - -
0.1531 3100 0.0027 - - - - -
0.1580 3200 0.0001 - - - - -
0.1630 3300 0.0002 - - - - -
0.1679 3400 0.002 - - - - -
0.1728 3500 0.0001 - - - - -
0.1778 3600 0.0023 - - - - -
0.1827 3700 0.0001 - - - - -
0.1877 3800 0.0001 - - - - -
0.1926 3900 0.0001 - - - - -
0.1975 4000 0.0027 - - - - -
0.2025 4100 0.0 - - - - -
0.2074 4200 0.0027 - - - - -
0.2124 4300 0.0027 - - - - -
0.2173 4400 0.0027 - - - - -
0.2222 4500 0.0 - - - - -
0.2272 4600 0.0001 - - - - -
0.2321 4700 0.0 - - - - -
0.2370 4800 0.002 - - - - -
0.2420 4900 0.0069 - - - - -
0.2469 5000 0.0 - - - - -
0.2519 5100 0.0016 - - - - -
0.2568 5200 0.0001 - - - - -
0.2617 5300 0.0 - - - - -
0.2667 5400 0.0022 - - - - -
0.2716 5500 0.0018 - - - - -
0.2766 5600 0.0019 - - - - -
0.2815 5700 0.0 - - - - -
0.2864 5800 0.0 - - - - -
0.2914 5900 0.0018 - - - - -
0.2963 6000 0.0 - - - - -
0.3012 6100 0.0001 - - - - -
0.3062 6200 0.0 - - - - -
0.3111 6300 0.002 - - - - -
0.3161 6400 0.0021 - - - - -
0.3210 6500 0.0 - - - - -
0.3259 6600 0.0032 - - - - -
0.3309 6700 0.002 - - - - -
0.3358 6800 0.0018 - - - - -
0.3408 6900 0.0001 - - - - -
0.3457 7000 0.0018 - - - - -
0.3506 7100 0.0 - - - - -
0.3556 7200 0.0 - - - - -
0.3605 7300 0.0018 - - - - -
0.3655 7400 0.0 - - - - -
0.3704 7500 0.0052 0.9905 0.9900 0.0100 0.9892 0.0108
0.3753 7600 0.0036 - - - - -
0.3803 7700 0.0022 - - - - -
0.3852 7800 0.0 - - - - -
0.3901 7900 0.0 - - - - -
0.3951 8000 0.0025 - - - - -
0.4000 8100 0.0021 - - - - -
0.4050 8200 0.0 - - - - -
0.4099 8300 0.0 - - - - -
0.4148 8400 0.0 - - - - -
0.4198 8500 0.0001 - - - - -
0.4247 8600 0.0 - - - - -
0.4297 8700 0.0018 - - - - -
0.4346 8800 0.0 - - - - -
0.4395 8900 0.0001 - - - - -
0.4445 9000 0.0 - - - - -
0.4494 9100 0.0039 - - - - -
0.4543 9200 0.0042 - - - - -
0.4593 9300 0.0019 - - - - -
0.4642 9400 0.0023 - - - - -
0.4692 9500 0.0 - - - - -
0.4741 9600 0.0 - - - - -
0.4790 9700 0.0019 - - - - -
0.4840 9800 0.0 - - - - -
0.4889 9900 0.0019 - - - - -
0.4939 10000 0.0 - - - - -
0.4988 10100 0.0048 - - - - -
0.5037 10200 0.0 - - - - -
0.5087 10300 0.0018 - - - - -
0.5136 10400 0.0013 - - - - -
0.5185 10500 0.0043 - - - - -
0.5235 10600 0.0001 - - - - -
0.5284 10700 0.0 - - - - -
0.5334 10800 0.0 - - - - -
0.5383 10900 0.0 - - - - -
0.5432 11000 0.0027 - - - - -
0.5482 11100 0.0062 - - - - -
0.5531 11200 0.0001 - - - - -
0.5581 11300 0.0001 - - - - -
0.5630 11400 0.0001 - - - - -
0.5679 11500 0.0048 - - - - -
0.5729 11600 0.0 - - - - -
0.5778 11700 0.0012 - - - - -
0.5827 11800 0.0026 - - - - -
0.5877 11900 0.0037 - - - - -
0.5926 12000 0.0 - - - - -
0.5976 12100 0.0 - - - - -
0.6025 12200 0.0059 - - - - -
0.6074 12300 0.0 - - - - -
0.6124 12400 0.0039 - - - - -
0.6173 12500 0.003 - - - - -
0.6223 12600 0.0 - - - - -
0.6272 12700 0.0 - - - - -
0.6321 12800 0.0 - - - - -
0.6371 12900 0.0036 - - - - -
0.6420 13000 0.0071 - - - - -
0.6469 13100 0.0 - - - - -
0.6519 13200 0.0 - - - - -
0.6568 13300 0.0 - - - - -
0.6618 13400 0.0 - - - - -
0.6667 13500 0.0033 - - - - -
0.6716 13600 0.0 - - - - -
0.6766 13700 0.0 - - - - -
0.6815 13800 0.0001 - - - - -
0.6865 13900 0.0018 - - - - -
0.6914 14000 0.0001 - - - - -
0.6963 14100 0.0036 - - - - -
0.7013 14200 0.0015 - - - - -
0.7062 14300 0.0001 - - - - -
0.7111 14400 0.0074 - - - - -
0.7161 14500 0.0 - - - - -
0.7210 14600 0.0035 - - - - -
0.7260 14700 0.0016 - - - - -
0.7309 14800 0.0018 - - - - -
0.7358 14900 0.0022 - - - - -
0.7408 15000 0.0 0.9924 0.99 0.01 0.9896 0.0104
0.7457 15100 0.0023 - - - - -
0.7507 15200 0.0018 - - - - -
0.7556 15300 0.0 - - - - -
0.7605 15400 0.0 - - - - -
0.7655 15500 0.0036 - - - - -
0.7704 15600 0.0 - - - - -
0.7753 15700 0.0 - - - - -
0.7803 15800 0.0 - - - - -
0.7852 15900 0.0036 - - - - -
0.7902 16000 0.0018 - - - - -
0.7951 16100 0.0019 - - - - -
0.8000 16200 0.003 - - - - -
0.8050 16300 0.0034 - - - - -
0.8099 16400 0.0 - - - - -
0.8149 16500 0.0 - - - - -
0.8198 16600 0.006 - - - - -
0.8247 16700 0.0018 - - - - -
0.8297 16800 0.0029 - - - - -
0.8346 16900 0.0 - - - - -
0.8395 17000 0.0018 - - - - -
0.8445 17100 0.0014 - - - - -
0.8494 17200 0.0032 - - - - -
0.8544 17300 0.0 - - - - -
0.8593 17400 0.0063 - - - - -
0.8642 17500 0.0 - - - - -
0.8692 17600 0.0002 - - - - -
0.8741 17700 0.0026 - - - - -
0.8791 17800 0.0069 - - - - -
0.8840 17900 0.0026 - - - - -
0.8889 18000 0.0061 - - - - -
0.8939 18100 0.0 - - - - -
0.8988 18200 0.0026 - - - - -
0.9037 18300 0.0 - - - - -
0.9087 18400 0.0 - - - - -
0.9136 18500 0.0 - - - - -
0.9186 18600 0.0022 - - - - -
0.9235 18700 0.0 - - - - -
0.9284 18800 0.0 - - - - -
0.9334 18900 0.0016 - - - - -
0.9383 19000 0.0076 - - - - -
0.9433 19100 0.0 - - - - -
0.9482 19200 0.0026 - - - - -
0.9531 19300 0.0 - - - - -
0.9581 19400 0.0 - - - - -
0.9630 19500 0.0018 - - - - -
0.9679 19600 0.002 - - - - -
0.9729 19700 0.0042 - - - - -
0.9778 19800 0.0044 - - - - -
0.9828 19900 0.0 - - - - -
0.9877 20000 0.0 - - - - -
0.9926 20100 0.0 - - - - -
0.9976 20200 0.0018 - - - - -
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.14.2
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.3
  • PyTorch: 2.9.1+cu128
  • Accelerate: 1.12.0
  • Datasets: 4.4.1
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
81
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shahafvl/reasonir-8b-scientific-parent-prompt

Finetuned
(2)
this model

Papers for shahafvl/reasonir-8b-scientific-parent-prompt

Evaluation results

  • Cosine Accuracy@1 on ir parent grandparent combined
    self-reported
    1.000
  • Cosine Accuracy@3 on ir parent grandparent combined
    self-reported
    1.000
  • Cosine Accuracy@5 on ir parent grandparent combined
    self-reported
    1.000
  • Cosine Accuracy@10 on ir parent grandparent combined
    self-reported
    1.000
  • Cosine Precision@1 on ir parent grandparent combined
    self-reported
    1.000
  • Cosine Precision@3 on ir parent grandparent combined
    self-reported
    1.000
  • Cosine Precision@5 on ir parent grandparent combined
    self-reported
    0.987
  • Cosine Precision@10 on ir parent grandparent combined
    self-reported
    0.595
  • Cosine Recall@1 on ir parent grandparent combined
    self-reported
    0.167
  • Cosine Recall@3 on ir parent grandparent combined
    self-reported
    0.500