SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L6-v2
Maximum Sequence Length: 384 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("krishmajumdar/arxiv-finetuned-v2")
# Run inference
sentences = [
    '<S> the effect of a random phase diffuser on fluctuations of laser light ( scintillations ) is studied . </S> <S> not only spatial but also temporal phase variations introduced by the phase diffuser are analyzed . </S> <S> the explicit dependence of the scintillation index on finite - time phase variations is obtained for long propagation paths . </S> <S> it is shown that for large amplitudes of phase fluctuations , a finite - time effect decreases the ability of phase diffuser to suppress the scintillations . </S>',
    'operators @xmath67 ( their dependence on time is as in vacuum ) . the term for @xmath68 can be obtained from eq . [ twelve ] by putting @xmath69 . substituting both distribution functions into eq . [ eight ] , we obtain @xmath70 @xmath71 @xmath72:\\big>,\\ ] ] where @xmath73 and @xmath74 are solutions of eqs . [ twelve ] with the initial conditions @xmath63 and @xmath75 , respectively . the operators on the right side of eq . [ thirteen ] are related through matching conditions with the amplitudes of the exiting laser radiation ( see ref . @xcite ) by the relation @xmath76 where @xmath77 is the operator of the laser field which is assumed to be a single - mode field and the subscript ( @xmath78 ) means perpendicular to the @xmath28-axis component . the function @xmath79 describes the profile of the laser mode , which is assumed to be gaussian - type function [ @xmath80 . @xmath1 desribes the initial radius of the beam . to account for the effect of the phase diffuser , a factor @xmath81 or @xmath82 should be inserted into the integrand of eq . [ fourteen ] . the quantity @xmath83 is the random phase introduced by the phase diffuser . a similar consideration is applicable to each of four photon operators entering both terms in square brackets of eq . [ thirteen ] . it can be easily seen that the factor @xmath84},\\ ] ] describing the effect of phase screen on the beam , enters implicitly the integrand of eq . [ thirteen ] ( the indices @xmath78 are omitted here for the sake of brevity ) . there are integrations over variables @xmath85 as shown in eq . [ fourteen ] . furthermore , the brackets @xmath16 ,',
    'that the candidate is detected with s / n @xmath136 in the unaffected image and also s / n @xmath137 in the image affected by the bad pixel . hence , we are confident that the source is real and that the photometry from the final drizzled image is robust . the sixth and final candidate is confidently detected at s / n@xmath138 in @xmath46 ( @xmath120 ) , and also in the @xmath38 with s / n = 3.7 . its photometric redshift is sharply peaked at @xmath139 , with a secondary solution at @xmath140 . this candidate is also very compact , with measured half - light radius @xmath141 , and the highest stellarity of the sample ( class_star = 0.91 ) . combining compactness with high stellarity from a high s / n source , a stellar nature ( cool dwarf ) for this source is relatively likely , as we discuss in section [ contamination ] . to translate the results on the search of possible candidates at @xmath3 from the archival borg[z8 ] data into a number density / luminosity function determination , we need to assess both the impact of contamination in our sample , and the effective volume probed by the data . there are multiple classes of lower-@xmath24 sources that may have similar @xmath103 colors to @xmath19 lyman - break galaxies ( lbgs ) , such as galactic stars , intermediate - redshift passive galaxies , and strong line emitters . cool , red stars in the milky way may be possible contaminants of our sample , although typical colors lack a strong @xmath103 drop . at low signal - to - noise ratio , the separation of point - like galactic stars from resolved galaxies using the ` sextractor ` class_star',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.5745, -0.0369],
#         [ 0.5745,  1.0000, -0.0618],
#         [-0.0369, -0.0618,  1.0000]])

Training Details

Training Dataset

Unnamed Dataset

Size: 614,911 training samples
Columns: abstract and article
Approximate statistics based on the first 1000 samples:
abstract article
type string string
details
min: 78 tokens
mean: 278.57 tokens
max: 384 tokens

min: 15 tokens
mean: 368.79 tokens
max: 384 tokens

	abstract	article
type	string	string
details	min: 78 tokens mean: 278.57 tokens max: 384 tokens	min: 15 tokens mean: 368.79 tokens max: 384 tokens

Samples:

abstract	article
additive models play an important role in semiparametric statistics . this paper gives learning rates for regularized kernel based methods for additive models . these learning rates compare favourably in particular in high dimensions to recent results on optimal learning rates for purely nonparametric regularized kernel based quantile regression using the gaussian radial basis function kernel , provided the assumption of an additive model is valid . additionally , a concrete example is presented to show that a gaussian function depending only on one variable lies in a reproducing kernel hilbert space generated by an additive gaussian kernel , but does not belong to the reproducing kernel hilbert space generated by the multivariate gaussian kernel of the same variance . * key words and phrases . * additive model , kernel , quantile regression , semiparametric , rate of convergence , support vector machine .	additive models @xcite provide an important family of models for semiparametric regression or classification . some reasons for the success of additive models are their increased flexibility when compared to linear or generalized linear models and their increased interpretability when compared to fully nonparametric models . it is well - known that good estimators in additive models are in general less prone to the curse of high dimensionality than good estimators in fully nonparametric models . many examples of such estimators belong to the large class of regularized kernel based methods over a reproducing kernel hilbert space @xmath0 , see e.g. @xcite . in the last years many interesting results on learning rates of regularized kernel based models for additive models have been published when the focus is on sparsity and when the classical least squares loss function is used , see e.g. @xcite , @xcite , @xcite , @xcite , @xcite , @xcite and the references therein . of course , the lea...
additive models play an important role in semiparametric statistics . this paper gives learning rates for regularized kernel based methods for additive models . these learning rates compare favourably in particular in high dimensions to recent results on optimal learning rates for purely nonparametric regularized kernel based quantile regression using the gaussian radial basis function kernel , provided the assumption of an additive model is valid . additionally , a concrete example is presented to show that a gaussian function depending only on one variable lies in a reproducing kernel hilbert space generated by an additive gaussian kernel , but does not belong to the reproducing kernel hilbert space generated by the multivariate gaussian kernel of the same variance . * key words and phrases . * additive model , kernel , quantile regression , semiparametric , rate of convergence , support vector machine .	e.g. @xcite for the general case and @xcite for additive models . therefore , we will here consider the case of regularized kernel based methods based on a general convex and lipschitz continuous loss function , on a general kernel , and on the classical regularizing term @xmath1 for some @xmath2 which is a smoothness penalty but not a sparsity penalty , see e.g. @xcite . such regularized kernel based methods are now often called support vector machines ( svms ) , although the notation was historically used for such methods based on the special hinge loss function and for special kernels only , we refer to @xcite . in this paper we address the open question , whether an svm with an additive kernel can provide a substantially better learning rate in high dimensions than an svm with a general kernel , say a classical gaussian rbf kernel , if the assumption of an additive model is satisfied . our leading example covers learning rates for quantile regression based on the lipschitz continuo...
additive models play an important role in semiparametric statistics . this paper gives learning rates for regularized kernel based methods for additive models . these learning rates compare favourably in particular in high dimensions to recent results on optimal learning rates for purely nonparametric regularized kernel based quantile regression using the gaussian radial basis function kernel , provided the assumption of an additive model is valid . additionally , a concrete example is presented to show that a gaussian function depending only on one variable lies in a reproducing kernel hilbert space generated by an additive gaussian kernel , but does not belong to the reproducing kernel hilbert space generated by the multivariate gaussian kernel of the same variance . * key words and phrases . * additive model , kernel , quantile regression , semiparametric , rate of convergence , support vector machine .	approach might be to fit both models and compare their risks evaluated for test data . for the same reason we will also not cover sparsity . consistency of support vector machines generated by additive kernels for additive models was considered in @xcite . in this paper we establish learning rates for these algorithms . let us recall the framework with a complete separable metric space @xmath3 as the input space and a closed subset @xmath4 of @xmath5 as the output space . a borel probability measure @xmath6 on @xmath7 is used to model the learning problem and an independent and identically distributed sample @xmath8 is drawn according to @xmath6 for learning . a loss function @xmath9 is used to measure the quality of a prediction function @xmath10 by the local error @xmath11 . _ throughout the paper we assume that @xmath12 is measurable , @xmath13 , convex with respect to the third variable , and uniformly lipschitz continuous satisfying @xmath14 with a finite constant @xmath15 . _ sup...

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "gather_across_devices": false
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 32
gradient_accumulation_steps: 2
warmup_ratio: 0.05
save_only_model: True
fp16: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: no
prediction_loss_only: True
per_device_train_batch_size: 32
per_device_eval_batch_size: 8
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 2
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 3
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.05
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: True
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
project: huggingface
trackio_space_id: trackio
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: no
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Click to expand

Epoch	Step	Training Loss
0.0104	100	0.8589
0.0208	200	0.5171
0.0312	300	0.4745
0.0416	400	0.4498
0.0520	500	0.4105
0.0624	600	0.394
0.0729	700	0.3896
0.0833	800	0.3788
0.0937	900	0.3561
0.1041	1000	0.3662
0.1145	1100	0.3419
0.1249	1200	0.3256
0.1353	1300	0.3337
0.1457	1400	0.335
0.1561	1500	0.3255
0.1665	1600	0.3099
0.1769	1700	0.3092
0.1873	1800	0.2985
0.1978	1900	0.2931
0.2082	2000	0.2977
0.2186	2100	0.2918
0.2290	2200	0.2856
0.2394	2300	0.2835
0.2498	2400	0.2689
0.2602	2500	0.2743
0.2706	2600	0.2504
0.2810	2700	0.2423
0.2914	2800	0.2717
0.3018	2900	0.2653
0.3122	3000	0.2543
0.3226	3100	0.256
0.3331	3200	0.2555
0.3435	3300	0.2485
0.3539	3400	0.243
0.3643	3500	0.2339
0.3747	3600	0.2447
0.3851	3700	0.2311
0.3955	3800	0.2245
0.4059	3900	0.2276
0.4163	4000	0.2243
0.4267	4100	0.2225
0.4371	4200	0.2391
0.4475	4300	0.2162
0.4580	4400	0.2194
0.4684	4500	0.2291
0.4788	4600	0.2307
0.4892	4700	0.2141
0.4996	4800	0.2124
0.5100	4900	0.2306
0.5204	5000	0.2075
0.5308	5100	0.2055
0.5412	5200	0.2294
0.5516	5300	0.2165
0.5620	5400	0.2165
0.5724	5500	0.1957
0.5828	5600	0.1971
0.5933	5700	0.1935
0.6037	5800	0.2077
0.6141	5900	0.1931
0.6245	6000	0.1987
0.6349	6100	0.1983
0.6453	6200	0.1889
0.6557	6300	0.1894
0.6661	6400	0.195
0.6765	6500	0.1936
0.6869	6600	0.1811
0.6973	6700	0.1835
0.7077	6800	0.2028
0.7182	6900	0.1904
0.7286	7000	0.1853
0.7390	7100	0.1646
0.7494	7200	0.1904
0.7598	7300	0.181
0.7702	7400	0.176
0.7806	7500	0.1746
0.7910	7600	0.1846
0.8014	7700	0.1706
0.8118	7800	0.1692
0.8222	7900	0.1696
0.8326	8000	0.171
0.0104	100	0.2682
0.0208	200	0.1698
0.0312	300	0.1492
0.0416	400	0.1597
0.0520	500	0.1421
0.0624	600	0.1412
0.0729	700	0.1367
0.0833	800	0.1407
0.0937	900	0.1276
0.1041	1000	0.1352
0.1145	1100	0.1307
0.1249	1200	0.1188
0.1353	1300	0.1211
0.1457	1400	0.1203
0.1561	1500	0.1131
0.1665	1600	0.1077
0.1769	1700	0.1061
0.1873	1800	0.1064
0.1978	1900	0.1016
0.2082	2000	0.1066
0.2186	2100	0.1077
0.2290	2200	0.1009
0.2394	2300	0.1048
0.2498	2400	0.0925
0.2602	2500	0.1054
0.2706	2600	0.0873
0.2810	2700	0.082
0.2914	2800	0.0976
0.3018	2900	0.097
0.3122	3000	0.0876
0.3226	3100	0.0959
0.3331	3200	0.0931
0.3435	3300	0.0903
0.3539	3400	0.0854
0.3643	3500	0.0841
0.3747	3600	0.0914
0.3851	3700	0.0809
0.3955	3800	0.0798
0.4059	3900	0.0847
0.4163	4000	0.0784
0.4267	4100	0.0837
0.4371	4200	0.092
0.4475	4300	0.0794
0.4580	4400	0.0811
0.4684	4500	0.0844
0.4788	4600	0.092
0.4892	4700	0.0743
0.4996	4800	0.0839
0.5100	4900	0.0939
0.5204	5000	0.0789
0.5308	5100	0.0769
0.5412	5200	0.0936
0.5516	5300	0.085
0.5620	5400	0.0857
0.5724	5500	0.0731
0.5828	5600	0.0766
0.5933	5700	0.078
0.6037	5800	0.0812
0.6141	5900	0.0731
0.6245	6000	0.0783
0.6349	6100	0.075
0.6453	6200	0.0734
0.6557	6300	0.0725
0.6661	6400	0.0796
0.6765	6500	0.0748
0.6869	6600	0.0722
0.6973	6700	0.0705
0.7077	6800	0.0831
0.7182	6900	0.0787
0.7286	7000	0.0779
0.7390	7100	0.0641
0.7494	7200	0.0795
0.7598	7300	0.0712
0.7702	7400	0.0698
0.7806	7500	0.068
0.7910	7600	0.0729
0.8014	7700	0.0693
0.8118	7800	0.0719
0.8222	7900	0.0735
0.8326	8000	0.073
0.8430	8100	0.1425
0.8535	8200	0.1422
0.8639	8300	0.1336
0.8743	8400	0.1448
0.8847	8500	0.1421
0.8951	8600	0.143
0.9055	8700	0.1299
0.9159	8800	0.1337
0.9263	8900	0.138
0.9367	9000	0.1417
0.9471	9100	0.1266
0.9575	9200	0.1187
0.9679	9300	0.1454
0.9784	9400	0.1322
0.9888	9500	0.137
0.9992	9600	0.1452
1.0096	9700	0.0936
1.0200	9800	0.0986
1.0304	9900	0.1021
1.0408	10000	0.1004
1.0512	10100	0.0954
1.0616	10200	0.1004
1.0720	10300	0.0974
1.0824	10400	0.0939
1.0928	10500	0.1039
1.1032	10600	0.111
1.1137	10700	0.0993
1.1241	10800	0.0975
1.1345	10900	0.0939
1.1449	11000	0.1042
1.1553	11100	0.0984
1.1657	11200	0.1008
1.1761	11300	0.0977
1.1865	11400	0.0881
1.1969	11500	0.0971
1.2073	11600	0.0909
1.2177	11700	0.0938
1.2281	11800	0.0933
1.2386	11900	0.1035
1.2490	12000	0.0931
1.2594	12100	0.1053
1.2698	12200	0.1043
1.2802	12300	0.0935
1.2906	12400	0.0928
1.3010	12500	0.0969
1.3114	12600	0.0901
1.3218	12700	0.0992
1.3322	12800	0.0978
1.3426	12900	0.0901
1.3530	13000	0.0835
1.3634	13100	0.0914
1.3739	13200	0.0922
1.3843	13300	0.0923
1.3947	13400	0.0917
1.4051	13500	0.089
1.4155	13600	0.0903
1.4259	13700	0.0913
1.4363	13800	0.093
1.4467	13900	0.0909
1.4571	14000	0.0906
1.4675	14100	0.0903
1.4779	14200	0.0946
1.4883	14300	0.0933
1.4988	14400	0.0898
1.5092	14500	0.088
1.5196	14600	0.0961
1.5300	14700	0.0887
1.5404	14800	0.0858
1.5508	14900	0.0878
1.5612	15000	0.092
1.5716	15100	0.0857
1.5820	15200	0.0878
1.5924	15300	0.0856
1.6028	15400	0.0887
1.6132	15500	0.0837
1.6236	15600	0.0832
1.6341	15700	0.083
1.6445	15800	0.0906
1.6549	15900	0.0844
1.6653	16000	0.085
1.6757	16100	0.0837
1.6861	16200	0.0826
1.6965	16300	0.0867
1.7069	16400	0.0902
1.7173	16500	0.0864
1.7277	16600	0.0882
1.7381	16700	0.0894
1.7485	16800	0.0902
1.7590	16900	0.0813
1.7694	17000	0.0821
1.7798	17100	0.0863
1.7902	17200	0.0828
1.8006	17300	0.0902
1.8110	17400	0.0831
1.8214	17500	0.0765
1.8318	17600	0.0806
1.8422	17700	0.0793
1.8526	17800	0.0842
1.8630	17900	0.0828
1.8734	18000	0.085
1.8838	18100	0.0803
1.8943	18200	0.0772
1.9047	18300	0.0865
1.9151	18400	0.0847
1.9255	18500	0.0835
1.9359	18600	0.0818
1.9463	18700	0.0757
1.9567	18800	0.0772
1.9671	18900	0.0854
1.9775	19000	0.0813
1.9879	19100	0.0844
1.9983	19200	0.0793
2.0087	19300	0.0668
2.0192	19400	0.0647
2.0296	19500	0.0702
2.0400	19600	0.0703
2.0504	19700	0.0641
2.0608	19800	0.0768
2.0712	19900	0.0632
2.0816	20000	0.0633
2.0920	20100	0.0608
2.1024	20200	0.0684
2.1128	20300	0.0618
2.1232	20400	0.063
2.1336	20500	0.0625
2.1440	20600	0.0631
2.1545	20700	0.0681
2.1649	20800	0.0584
2.1753	20900	0.0655
2.1857	21000	0.0651
2.1961	21100	0.0699
2.2065	21200	0.0704
2.2169	21300	0.0686
2.2273	21400	0.0655
2.2377	21500	0.063
2.2481	21600	0.0657
2.2585	21700	0.0694
2.2689	21800	0.066
2.2794	21900	0.0677
2.2898	22000	0.0617
2.3002	22100	0.0612
2.3106	22200	0.06
2.3210	22300	0.0572
2.3314	22400	0.0642
2.3418	22500	0.0601
2.3522	22600	0.0581
2.3626	22700	0.0702
2.3730	22800	0.0614
2.3834	22900	0.0631
2.3938	23000	0.0586
2.4042	23100	0.0638
2.4147	23200	0.0584
2.4251	23300	0.068
2.4355	23400	0.0681
2.4459	23500	0.0616
2.4563	23600	0.0604
2.4667	23700	0.0618
2.4771	23800	0.0603
2.4875	23900	0.0643
2.4979	24000	0.0639
2.5083	24100	0.0656
2.5187	24200	0.0578
2.5291	24300	0.0613
2.5396	24400	0.061
2.5500	24500	0.0578
2.5604	24600	0.059
2.5708	24700	0.0586
2.5812	24800	0.0532
2.5916	24900	0.0547
2.6020	25000	0.0596
2.6124	25100	0.0614
2.6228	25200	0.0547
2.6332	25300	0.056
2.6436	25400	0.0578
2.6540	25500	0.0611
2.6644	25600	0.0605
2.6749	25700	0.062
2.6853	25800	0.0601
2.6957	25900	0.0618
2.7061	26000	0.055
2.7165	26100	0.0614
2.7269	26200	0.0553
2.7373	26300	0.0587
2.7477	26400	0.0629
2.7581	26500	0.0559
2.7685	26600	0.0559
2.7789	26700	0.0533
2.7893	26800	0.0591
2.7998	26900	0.0526
2.8102	27000	0.0548
2.8206	27100	0.0562
2.8310	27200	0.0577
2.8414	27300	0.0611
2.8518	27400	0.0565
2.8622	27500	0.0627
2.8726	27600	0.0604
2.8830	27700	0.0578
2.8934	27800	0.0564
2.9038	27900	0.0591
2.9142	28000	0.0566
2.9246	28100	0.0541
2.9351	28200	0.0544
2.9455	28300	0.0598
2.9559	28400	0.0592
2.9663	28500	0.0559
2.9767	28600	0.0578
2.9871	28700	0.055
2.9975	28800	0.0509

Framework Versions

Python: 3.12.12
Sentence Transformers: 5.2.0
Transformers: 4.57.1
PyTorch: 2.8.0+cu126
Accelerate: 1.12.0
Datasets: 4.4.2
Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}