SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("krishmajumdar/arxiv-finetuned-v2")
# Run inference
sentences = [
    '<S> the effect of a random phase diffuser on fluctuations of laser light ( scintillations ) is studied . </S> <S> not only spatial but also temporal phase variations introduced by the phase diffuser are analyzed . </S> <S> the explicit dependence of the scintillation index on finite - time phase variations is obtained for long propagation paths . </S> <S> it is shown that for large amplitudes of phase fluctuations , a finite - time effect decreases the ability of phase diffuser to suppress the scintillations . </S>',
    'operators @xmath67 ( their dependence on time is as in vacuum ) . the term for @xmath68 can be obtained from eq . [ twelve ] by putting @xmath69 . substituting both distribution functions into eq . [ eight ] , we obtain @xmath70 @xmath71 @xmath72:\\big>,\\ ] ] where @xmath73 and @xmath74 are solutions of eqs . [ twelve ] with the initial conditions @xmath63 and @xmath75 , respectively . the operators on the right side of eq . [ thirteen ] are related through matching conditions with the amplitudes of the exiting laser radiation ( see ref . @xcite ) by the relation @xmath76 where @xmath77 is the operator of the laser field which is assumed to be a single - mode field and the subscript ( @xmath78 ) means perpendicular to the @xmath28-axis component . the function @xmath79 describes the profile of the laser mode , which is assumed to be gaussian - type function [ @xmath80 . @xmath1 desribes the initial radius of the beam . to account for the effect of the phase diffuser , a factor @xmath81 or @xmath82 should be inserted into the integrand of eq . [ fourteen ] . the quantity @xmath83 is the random phase introduced by the phase diffuser . a similar consideration is applicable to each of four photon operators entering both terms in square brackets of eq . [ thirteen ] . it can be easily seen that the factor @xmath84},\\ ] ] describing the effect of phase screen on the beam , enters implicitly the integrand of eq . [ thirteen ] ( the indices @xmath78 are omitted here for the sake of brevity ) . there are integrations over variables @xmath85 as shown in eq . [ fourteen ] . furthermore , the brackets @xmath16 ,',
    'that the candidate is detected with s / n @xmath136 in the unaffected image and also s / n @xmath137 in the image affected by the bad pixel . hence , we are confident that the source is real and that the photometry from the final drizzled image is robust . the sixth and final candidate is confidently detected at s / n@xmath138 in @xmath46 ( @xmath120 ) , and also in the @xmath38 with s / n = 3.7 . its photometric redshift is sharply peaked at @xmath139 , with a secondary solution at @xmath140 . this candidate is also very compact , with measured half - light radius @xmath141 , and the highest stellarity of the sample ( class_star = 0.91 ) . combining compactness with high stellarity from a high s / n source , a stellar nature ( cool dwarf ) for this source is relatively likely , as we discuss in section [ contamination ] . to translate the results on the search of possible candidates at @xmath3 from the archival borg[z8 ] data into a number density / luminosity function determination , we need to assess both the impact of contamination in our sample , and the effective volume probed by the data . there are multiple classes of lower-@xmath24 sources that may have similar @xmath103 colors to @xmath19 lyman - break galaxies ( lbgs ) , such as galactic stars , intermediate - redshift passive galaxies , and strong line emitters . cool , red stars in the milky way may be possible contaminants of our sample , although typical colors lack a strong @xmath103 drop . at low signal - to - noise ratio , the separation of point - like galactic stars from resolved galaxies using the ` sextractor ` class_star',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.5745, -0.0369],
#         [ 0.5745,  1.0000, -0.0618],
#         [-0.0369, -0.0618,  1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 614,911 training samples
  • Columns: abstract and article
  • Approximate statistics based on the first 1000 samples:
    abstract article
    type string string
    details
    • min: 78 tokens
    • mean: 278.57 tokens
    • max: 384 tokens
    • min: 15 tokens
    • mean: 368.79 tokens
    • max: 384 tokens
  • Samples:
    abstract article
    additive models play an important role in semiparametric statistics . this paper gives learning rates for regularized kernel based methods for additive models . these learning rates compare favourably in particular in high dimensions to recent results on optimal learning rates for purely nonparametric regularized kernel based quantile regression using the gaussian radial basis function kernel , provided the assumption of an additive model is valid . additionally , a concrete example is presented to show that a gaussian function depending only on one variable lies in a reproducing kernel hilbert space generated by an additive gaussian kernel , but does not belong to the reproducing kernel hilbert space generated by the multivariate gaussian kernel of the same variance . * key words and phrases . * additive model , kernel , quantile regression , semiparametric , rate of convergence , support vector machine . additive models @xcite provide an important family of models for semiparametric regression or classification . some reasons for the success of additive models are their increased flexibility when compared to linear or generalized linear models and their increased interpretability when compared to fully nonparametric models . it is well - known that good estimators in additive models are in general less prone to the curse of high dimensionality than good estimators in fully nonparametric models . many examples of such estimators belong to the large class of regularized kernel based methods over a reproducing kernel hilbert space @xmath0 , see e.g. @xcite . in the last years many interesting results on learning rates of regularized kernel based models for additive models have been published when the focus is on sparsity and when the classical least squares loss function is used , see e.g. @xcite , @xcite , @xcite , @xcite , @xcite , @xcite and the references therein . of course , the lea...
    additive models play an important role in semiparametric statistics . this paper gives learning rates for regularized kernel based methods for additive models . these learning rates compare favourably in particular in high dimensions to recent results on optimal learning rates for purely nonparametric regularized kernel based quantile regression using the gaussian radial basis function kernel , provided the assumption of an additive model is valid . additionally , a concrete example is presented to show that a gaussian function depending only on one variable lies in a reproducing kernel hilbert space generated by an additive gaussian kernel , but does not belong to the reproducing kernel hilbert space generated by the multivariate gaussian kernel of the same variance . * key words and phrases . * additive model , kernel , quantile regression , semiparametric , rate of convergence , support vector machine . e.g. @xcite for the general case and @xcite for additive models . therefore , we will here consider the case of regularized kernel based methods based on a general convex and lipschitz continuous loss function , on a general kernel , and on the classical regularizing term @xmath1 for some @xmath2 which is a smoothness penalty but not a sparsity penalty , see e.g. @xcite . such regularized kernel based methods are now often called support vector machines ( svms ) , although the notation was historically used for such methods based on the special hinge loss function and for special kernels only , we refer to @xcite . in this paper we address the open question , whether an svm with an additive kernel can provide a substantially better learning rate in high dimensions than an svm with a general kernel , say a classical gaussian rbf kernel , if the assumption of an additive model is satisfied . our leading example covers learning rates for quantile regression based on the lipschitz continuo...
    additive models play an important role in semiparametric statistics . this paper gives learning rates for regularized kernel based methods for additive models . these learning rates compare favourably in particular in high dimensions to recent results on optimal learning rates for purely nonparametric regularized kernel based quantile regression using the gaussian radial basis function kernel , provided the assumption of an additive model is valid . additionally , a concrete example is presented to show that a gaussian function depending only on one variable lies in a reproducing kernel hilbert space generated by an additive gaussian kernel , but does not belong to the reproducing kernel hilbert space generated by the multivariate gaussian kernel of the same variance . * key words and phrases . * additive model , kernel , quantile regression , semiparametric , rate of convergence , support vector machine . approach might be to fit both models and compare their risks evaluated for test data . for the same reason we will also not cover sparsity . consistency of support vector machines generated by additive kernels for additive models was considered in @xcite . in this paper we establish learning rates for these algorithms . let us recall the framework with a complete separable metric space @xmath3 as the input space and a closed subset @xmath4 of @xmath5 as the output space . a borel probability measure @xmath6 on @xmath7 is used to model the learning problem and an independent and identically distributed sample @xmath8 is drawn according to @xmath6 for learning . a loss function @xmath9 is used to measure the quality of a prediction function @xmath10 by the local error @xmath11 . _ throughout the paper we assume that @xmath12 is measurable , @xmath13 , convex with respect to the third variable , and uniformly lipschitz continuous satisfying @xmath14 with a finite constant @xmath15 . _ sup...
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • gradient_accumulation_steps: 2
  • warmup_ratio: 0.05
  • save_only_model: True
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.05
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: True
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.0104 100 0.8589
0.0208 200 0.5171
0.0312 300 0.4745
0.0416 400 0.4498
0.0520 500 0.4105
0.0624 600 0.394
0.0729 700 0.3896
0.0833 800 0.3788
0.0937 900 0.3561
0.1041 1000 0.3662
0.1145 1100 0.3419
0.1249 1200 0.3256
0.1353 1300 0.3337
0.1457 1400 0.335
0.1561 1500 0.3255
0.1665 1600 0.3099
0.1769 1700 0.3092
0.1873 1800 0.2985
0.1978 1900 0.2931
0.2082 2000 0.2977
0.2186 2100 0.2918
0.2290 2200 0.2856
0.2394 2300 0.2835
0.2498 2400 0.2689
0.2602 2500 0.2743
0.2706 2600 0.2504
0.2810 2700 0.2423
0.2914 2800 0.2717
0.3018 2900 0.2653
0.3122 3000 0.2543
0.3226 3100 0.256
0.3331 3200 0.2555
0.3435 3300 0.2485
0.3539 3400 0.243
0.3643 3500 0.2339
0.3747 3600 0.2447
0.3851 3700 0.2311
0.3955 3800 0.2245
0.4059 3900 0.2276
0.4163 4000 0.2243
0.4267 4100 0.2225
0.4371 4200 0.2391
0.4475 4300 0.2162
0.4580 4400 0.2194
0.4684 4500 0.2291
0.4788 4600 0.2307
0.4892 4700 0.2141
0.4996 4800 0.2124
0.5100 4900 0.2306
0.5204 5000 0.2075
0.5308 5100 0.2055
0.5412 5200 0.2294
0.5516 5300 0.2165
0.5620 5400 0.2165
0.5724 5500 0.1957
0.5828 5600 0.1971
0.5933 5700 0.1935
0.6037 5800 0.2077
0.6141 5900 0.1931
0.6245 6000 0.1987
0.6349 6100 0.1983
0.6453 6200 0.1889
0.6557 6300 0.1894
0.6661 6400 0.195
0.6765 6500 0.1936
0.6869 6600 0.1811
0.6973 6700 0.1835
0.7077 6800 0.2028
0.7182 6900 0.1904
0.7286 7000 0.1853
0.7390 7100 0.1646
0.7494 7200 0.1904
0.7598 7300 0.181
0.7702 7400 0.176
0.7806 7500 0.1746
0.7910 7600 0.1846
0.8014 7700 0.1706
0.8118 7800 0.1692
0.8222 7900 0.1696
0.8326 8000 0.171
0.0104 100 0.2682
0.0208 200 0.1698
0.0312 300 0.1492
0.0416 400 0.1597
0.0520 500 0.1421
0.0624 600 0.1412
0.0729 700 0.1367
0.0833 800 0.1407
0.0937 900 0.1276
0.1041 1000 0.1352
0.1145 1100 0.1307
0.1249 1200 0.1188
0.1353 1300 0.1211
0.1457 1400 0.1203
0.1561 1500 0.1131
0.1665 1600 0.1077
0.1769 1700 0.1061
0.1873 1800 0.1064
0.1978 1900 0.1016
0.2082 2000 0.1066
0.2186 2100 0.1077
0.2290 2200 0.1009
0.2394 2300 0.1048
0.2498 2400 0.0925
0.2602 2500 0.1054
0.2706 2600 0.0873
0.2810 2700 0.082
0.2914 2800 0.0976
0.3018 2900 0.097
0.3122 3000 0.0876
0.3226 3100 0.0959
0.3331 3200 0.0931
0.3435 3300 0.0903
0.3539 3400 0.0854
0.3643 3500 0.0841
0.3747 3600 0.0914
0.3851 3700 0.0809
0.3955 3800 0.0798
0.4059 3900 0.0847
0.4163 4000 0.0784
0.4267 4100 0.0837
0.4371 4200 0.092
0.4475 4300 0.0794
0.4580 4400 0.0811
0.4684 4500 0.0844
0.4788 4600 0.092
0.4892 4700 0.0743
0.4996 4800 0.0839
0.5100 4900 0.0939
0.5204 5000 0.0789
0.5308 5100 0.0769
0.5412 5200 0.0936
0.5516 5300 0.085
0.5620 5400 0.0857
0.5724 5500 0.0731
0.5828 5600 0.0766
0.5933 5700 0.078
0.6037 5800 0.0812
0.6141 5900 0.0731
0.6245 6000 0.0783
0.6349 6100 0.075
0.6453 6200 0.0734
0.6557 6300 0.0725
0.6661 6400 0.0796
0.6765 6500 0.0748
0.6869 6600 0.0722
0.6973 6700 0.0705
0.7077 6800 0.0831
0.7182 6900 0.0787
0.7286 7000 0.0779
0.7390 7100 0.0641
0.7494 7200 0.0795
0.7598 7300 0.0712
0.7702 7400 0.0698
0.7806 7500 0.068
0.7910 7600 0.0729
0.8014 7700 0.0693
0.8118 7800 0.0719
0.8222 7900 0.0735
0.8326 8000 0.073
0.8430 8100 0.1425
0.8535 8200 0.1422
0.8639 8300 0.1336
0.8743 8400 0.1448
0.8847 8500 0.1421
0.8951 8600 0.143
0.9055 8700 0.1299
0.9159 8800 0.1337
0.9263 8900 0.138
0.9367 9000 0.1417
0.9471 9100 0.1266
0.9575 9200 0.1187
0.9679 9300 0.1454
0.9784 9400 0.1322
0.9888 9500 0.137
0.9992 9600 0.1452
1.0096 9700 0.0936
1.0200 9800 0.0986
1.0304 9900 0.1021
1.0408 10000 0.1004
1.0512 10100 0.0954
1.0616 10200 0.1004
1.0720 10300 0.0974
1.0824 10400 0.0939
1.0928 10500 0.1039
1.1032 10600 0.111
1.1137 10700 0.0993
1.1241 10800 0.0975
1.1345 10900 0.0939
1.1449 11000 0.1042
1.1553 11100 0.0984
1.1657 11200 0.1008
1.1761 11300 0.0977
1.1865 11400 0.0881
1.1969 11500 0.0971
1.2073 11600 0.0909
1.2177 11700 0.0938
1.2281 11800 0.0933
1.2386 11900 0.1035
1.2490 12000 0.0931
1.2594 12100 0.1053
1.2698 12200 0.1043
1.2802 12300 0.0935
1.2906 12400 0.0928
1.3010 12500 0.0969
1.3114 12600 0.0901
1.3218 12700 0.0992
1.3322 12800 0.0978
1.3426 12900 0.0901
1.3530 13000 0.0835
1.3634 13100 0.0914
1.3739 13200 0.0922
1.3843 13300 0.0923
1.3947 13400 0.0917
1.4051 13500 0.089
1.4155 13600 0.0903
1.4259 13700 0.0913
1.4363 13800 0.093
1.4467 13900 0.0909
1.4571 14000 0.0906
1.4675 14100 0.0903
1.4779 14200 0.0946
1.4883 14300 0.0933
1.4988 14400 0.0898
1.5092 14500 0.088
1.5196 14600 0.0961
1.5300 14700 0.0887
1.5404 14800 0.0858
1.5508 14900 0.0878
1.5612 15000 0.092
1.5716 15100 0.0857
1.5820 15200 0.0878
1.5924 15300 0.0856
1.6028 15400 0.0887
1.6132 15500 0.0837
1.6236 15600 0.0832
1.6341 15700 0.083
1.6445 15800 0.0906
1.6549 15900 0.0844
1.6653 16000 0.085
1.6757 16100 0.0837
1.6861 16200 0.0826
1.6965 16300 0.0867
1.7069 16400 0.0902
1.7173 16500 0.0864
1.7277 16600 0.0882
1.7381 16700 0.0894
1.7485 16800 0.0902
1.7590 16900 0.0813
1.7694 17000 0.0821
1.7798 17100 0.0863
1.7902 17200 0.0828
1.8006 17300 0.0902
1.8110 17400 0.0831
1.8214 17500 0.0765
1.8318 17600 0.0806
1.8422 17700 0.0793
1.8526 17800 0.0842
1.8630 17900 0.0828
1.8734 18000 0.085
1.8838 18100 0.0803
1.8943 18200 0.0772
1.9047 18300 0.0865
1.9151 18400 0.0847
1.9255 18500 0.0835
1.9359 18600 0.0818
1.9463 18700 0.0757
1.9567 18800 0.0772
1.9671 18900 0.0854
1.9775 19000 0.0813
1.9879 19100 0.0844
1.9983 19200 0.0793
2.0087 19300 0.0668
2.0192 19400 0.0647
2.0296 19500 0.0702
2.0400 19600 0.0703
2.0504 19700 0.0641
2.0608 19800 0.0768
2.0712 19900 0.0632
2.0816 20000 0.0633
2.0920 20100 0.0608
2.1024 20200 0.0684
2.1128 20300 0.0618
2.1232 20400 0.063
2.1336 20500 0.0625
2.1440 20600 0.0631
2.1545 20700 0.0681
2.1649 20800 0.0584
2.1753 20900 0.0655
2.1857 21000 0.0651
2.1961 21100 0.0699
2.2065 21200 0.0704
2.2169 21300 0.0686
2.2273 21400 0.0655
2.2377 21500 0.063
2.2481 21600 0.0657
2.2585 21700 0.0694
2.2689 21800 0.066
2.2794 21900 0.0677
2.2898 22000 0.0617
2.3002 22100 0.0612
2.3106 22200 0.06
2.3210 22300 0.0572
2.3314 22400 0.0642
2.3418 22500 0.0601
2.3522 22600 0.0581
2.3626 22700 0.0702
2.3730 22800 0.0614
2.3834 22900 0.0631
2.3938 23000 0.0586
2.4042 23100 0.0638
2.4147 23200 0.0584
2.4251 23300 0.068
2.4355 23400 0.0681
2.4459 23500 0.0616
2.4563 23600 0.0604
2.4667 23700 0.0618
2.4771 23800 0.0603
2.4875 23900 0.0643
2.4979 24000 0.0639
2.5083 24100 0.0656
2.5187 24200 0.0578
2.5291 24300 0.0613
2.5396 24400 0.061
2.5500 24500 0.0578
2.5604 24600 0.059
2.5708 24700 0.0586
2.5812 24800 0.0532
2.5916 24900 0.0547
2.6020 25000 0.0596
2.6124 25100 0.0614
2.6228 25200 0.0547
2.6332 25300 0.056
2.6436 25400 0.0578
2.6540 25500 0.0611
2.6644 25600 0.0605
2.6749 25700 0.062
2.6853 25800 0.0601
2.6957 25900 0.0618
2.7061 26000 0.055
2.7165 26100 0.0614
2.7269 26200 0.0553
2.7373 26300 0.0587
2.7477 26400 0.0629
2.7581 26500 0.0559
2.7685 26600 0.0559
2.7789 26700 0.0533
2.7893 26800 0.0591
2.7998 26900 0.0526
2.8102 27000 0.0548
2.8206 27100 0.0562
2.8310 27200 0.0577
2.8414 27300 0.0611
2.8518 27400 0.0565
2.8622 27500 0.0627
2.8726 27600 0.0604
2.8830 27700 0.0578
2.8934 27800 0.0564
2.9038 27900 0.0591
2.9142 28000 0.0566
2.9246 28100 0.0541
2.9351 28200 0.0544
2.9455 28300 0.0598
2.9559 28400 0.0592
2.9663 28500 0.0559
2.9767 28600 0.0578
2.9871 28700 0.055
2.9975 28800 0.0509

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.2.0
  • Transformers: 4.57.1
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.12.0
  • Datasets: 4.4.2
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
175
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for krishmajumdar/arxiv-finetuned-v2

Finetuned
(700)
this model

Papers for krishmajumdar/arxiv-finetuned-v2