Can preoperative parameters predict successful sperm retrieval and live birth in couples undergoing testicular sperm extraction and intracytoplasmic sperm injection for azoospermia?

We aimed to determine if the success of TESE and live-birth following TESE-ICSI can be predicted from readily available preoperative parameters for couples with azoospermia. Our methodology was as follows, this was a cohort study of couples who attended the fertility service (from 2009-2019) at an NHS hospital in whom the male partner was diagnosed with azoospermia and required conventional TESE with multiple biopsies to obtain sperm. Of 414 men included, 223 had successful TESE and of those 178 have used sperm in ICSI cycle(s). Predictive models were developed using logistic regression. We assessed model performance by internally validated concordance statistics and calibration plots. Successful sperm retrieval was defined as the presence of motile sperm which survived the freeze-thaw process and live-birth defined as delivery after 34 weeks of gestation. Successful TESE was associated with higher male age and lower FSH. The TESE model discriminated well with a c statistic of 0.81 (0.77-0.85). Live-birth was associated with lower maternal age, earlier ICSI cycle, and lower testicular volume. The live-birth model also discriminated well with a c statistic of 0.70 (0.64-0.76). These results support the pragmatic counselling of couples diagnosed with azoospermia about the chances of success of the TESE procedure and of biological parenthood prior to surgical intervention. The models help to discriminate between men who have a high or low chance of successful TESE and couples who have a higher chance of achieving a live-birth after successful TESE. This will allow couples to make a better assessment of the balance of risk versus benefit prior to commitment to surgical interventions.


Background
Azoospermia is diagnosed when no sperm is identified after two separate semen analyses in accordance with the WHO laboratory manual [1]. Approximately, 10-15% of men with subfertility have azoospermia (constituting 1% of the general male population) [2,3]. Men with azoospermia undergo systematic evaluation to determine the likely aetiology. Workup involves clinical assessment with a detailed history, general physical and genital examination, non-invasive tests including hormonal, genetic and chromosomal, as well as radiological investigations. The majority of couples with azoospermia require a surgical sperm retrieval (SSR) procedure which can be carried out by a variety of techniques, in combination with intracytoplasmic sperm injection (ICSI), to become biological parents. Obstructive lesions of the seminal tract are common is azoospermic men with a normal testicular volume and hormone profile. SSR techniques recommended for this cohort include microsurgical epididymal sperm aspiration (MESA), percutaneous sperm aspiration (PESA) or testicular sperm extraction (TESE). In the cohort of men with a non-obstructive picture, the European Association of Urology guidance recommends TESE [4]. However, SSR and particularly TESE is associated with complications including infection, haematoma formation and devascularisation which can diminish the testicular parenchyma leading to low testosterone levels in future.
Multiple previous studies have aimed to identify predictors of successful SSR  and positive pregnancy outcomes [32][33][34][35][36][37][38][39] in couples undergoing SSR and ICSI. A few studies have aimed to translate knowledge about these associations and developed models to predict success of SSR [14,24,30,31] and success of ICSI following SSR [36,40] to enhance counselling about likelihood of success and to support couples to make informed choices about their subsequent management.
Many of these studies to test specific associations involve post-operative parameters including testicular histological diagnoses [6, 8-10, 12, 15-18, 21, 24, 26, 27, 32-35]. Although this may be useful in counselling couples who require a further surgical procedure to retrieve sperm, histological diagnosis in itself is an invasive procedure and this is not available prior to more invasive measures such as TESE.
We aimed to use our NHS cohort data to develop and internally validate pragmatic preoperative models for couples who present to the fertility clinic with azoospermia to determine if these factors can predict success. This then aims to provide individualised counselling for success of the TESE procedure in retrieving sperm and the subsequent chances of live birth after successful retrieval.

Study design and population
The cohort included 10 years of data from a single NHS Reproductive Medicine Unit (April 2009 until April 2019). The data for couples who underwent open conventional TESE procedure with multiple biopsies as part of their management for azoospermia were extracted from a structured clinical database. All patients had undergone a robust clinical assessment including history, physical and radiological examination followed by endocrine and genetic analysis with consistent methods of assessment throughout the study period. As per the hospital policy, if this assessment indicated an obstructive aetiology which was secondary to epididymal obstruction, patients were offered a PESA procedure. If this clinical assessment indicated other causes of obstruction or a non-obstructive aetiology, patients were referred for TESE. Microsurgical vasectomy reversible is not available in NHS units. Unsuccessful PESA patients were also then referred for TESE. We identified 418 patients who had a TESE procedure throughout the study period. Men with a history of vasectomy are not eligible for NHS funded treatment. Microscopic TESE was not available at our unit during the 10-year study period; patients were referred to another NHS unit if they elected to have this intervention. Most patients with Klinefelter's syndrome were referred for micro-TESE.
Ages were recorded on day of TESE. For female partners, demographic data and cycle data including pregnancy outcomes were obtained from the clinical research database. Anti-Mullerian hormone (AMH) data for the measurement of ovarian reserve was not used as it was collected by three different assays across the 10year period. Additionally, the unit did not use antral follicle count (AFC) for ovarian reserve assessment during the entire study period. Four men with Klinefelter's syndrome elected to proceed with conventional TESE and these were excluded from subsequent analysis. No patients with AZFa or AZFb microdeletions were included as the chances of finding sperm are known to be very low amongst these men. We sought to increase the ability to predict outcomes in couples with intermediate risk and therefore inclusion of these discriminants would serve only to artificially increase the predictive performance of the model without adding clinical benefit.
We report outcomes from a single NHS unit. The clinical commissioning groups (CCGs) within the geographical area provide varying free NHS ICSI funding from one to three cycles and this influenced the number of ICSI treatment cycles couples had.
The TESE technique and biopsy processing in the laboratory, and ART technique were carried out as per the unit protocols and are detailed in the supplementary materials.

Outcome measures
Successful SSR was defined as the presence of motile sperm which survived the freeze-thaw process as all TESE samples are frozen and used in an ICSI cycle at a later date to eliminate the unnecessary risks of ovarian stimulation when no sperm are identified, as per our unit policy. Live birth was defined as delivery after 34 weeks of gestation (one live birth could represent a singleton or multiple pregnancy).

Statistical analysis, model building and internal validation
Baseline characteristics were examined by outcome, with means and standard deviations, medians and interquartile ranges, and frequencies and proportions presented for normal and non-normal continuous variables and categorical variables, respectively. Data were missing on 62 couples for male testosterone value and on 30 couples for male LH value, representing 15.0% and 7.2% of couples, respectively. These values were singly imputed using Bayesian stochastic regression.
The univariable associations between candidate predictors and outcome of successful SSR and outcome of live birth were examined using logistic regression. Candidate predictors were selected for inclusion in the multivariable model on the basis of theoretical considerations with substantively guided forward (p=<0.15) and backward selection (p=<0.15) of select candidate predictors. We examined nonlinear associations between continuous predictors and outcome. The outcome of live birth was initially modelled on the identified predictors using multilevel mixed-effects logistic regression to account for the non-independence of correlated cycles; however, as the fixed effects were near-identical and a likelihoodratio test comparing the mixed-effects model with single level logistic regression was not significant, we presented the results of the single level logistic regression. We internally validated both the model selection process and model performance to account for overfitting using the bootstrap procedure with 200 replicates before constructing the receiver operator curve (ROC) to assess the discriminative ability of the models. The internal calibration of the models was checked using the 'calibrationbelt' third party Stata package [41]. Statistical analyses were completed using Stata version 15 and SPSS version 16.

Results
Over the 10-year period, 414 azoospermic men underwent conventional TESE, of which 223 (53.9%) had successful sperm retrieval. Table 1 summarises the baseline characteristics of couples who underwent TESE and , compared with those with unsuccessful retrieval. The female partners of men undergoing successful TESE were also older. The univariable associations are shown in Table 3. FSH and LH were found to be highly correlated and LH was dropped as a candidate predictor. In multivariable analysis, male age, male FSH and male testosterone were selected in the sperm retrieval prediction model. The shape of the association between both male and female ages and success appeared to be linear, whereas the association between FSH and success appeared curvilinear with inclusion of the quadratic term. The multivariable model is shown in Table 3. The discriminative performance of the model on the data was good with the bootstrapped AUROC  0.81 (0.77-0.85). The model was internally wellcalibrated, as expected (Fig. 1).

Pregnancy outcomes
One hundred seventy-eight couples underwent ICSI treatment after TESE (113 couples had one cycle, 45 couples had two cycles, 17 couples had three cycles and 3 couples had four cycles, ( Table 2)). The total number of live births during study was 91 from 88 couples where one live birth represented either a singleton or multiple pregnancy. Some women had a live birth in more than one cycle. Differences between male hormones in couples undergoing ICSI who achieved live birth compared with those who did not were less marked; however, median male testosterone was greater in those who achieved live birth ( Table 3. In multivariable analysis, female age, cycle number, male FSH and male testicular volume were selected as predictors of live birth and the odds ratios with 95% confidence intervals are shown in Table 3. We combined the cycle 3 and cycle 4 groups due to the small number who underwent cycle 4. As the proportion of live birth did not differ significantly between cycles 1 and 2, and as the continuous parameterization did not significantly contribute to the model, only cycle number ≥3 was included. No polynomial terms contributed to the model significantly. The discriminative performance of the model on the data was good with the bootstrapped AUROC 0.70 (0.64-0.76). The model was internally wellcalibrated with less reliable prediction at the upper end of its predictive range of 0.20-0.73 (Fig. 2).

Discussion
Fertility treatment for couples with azoospermia involves multiple stages including SSR, controlled ovarian stimulation, embryo transfer and luteal phase support. Each of these stages is associated with risks and therefore models which can help to predict the chances of success following their initial clinical assessment would be beneficial to facilitate each couple's informed decision-making. We have developed two models based on routine data which could be utilised in the fertility clinic. The first model would enable the differentiation between couples with a low and high chance of success of TESE prior to surgical intervention. Following successful TESE, the second model can be utilised to support the counselling of couples prior to ovarian stimulation and TESE-ICSI about the chances of biological parenthood. To our knowledge, this is the first pre-ovarian stimulation model to be published in the literature.
The success of TESE in our cohort was 53.9%, this is in keeping with other similar cohorts reported in the literature (42.3-53.2%) [10-12, 14, 18, 28, 32]. Our model demonstrated that higher male age and lower male FSH were predictive for successful sperm retrieval with TESE. The model's performance was assessed, discrimination of the model was good with a concordance statistic of 0.81 and with evidence that the model demonstrates excellent calibration and is suitable for personalised prediction. This suggests that the model can distinguish between those men with azoospermia who had a good prognosis to surgically retrieve sperm using TESE and those who had a poor prognosis. Currently, there is only one other predictive model (PM) for the success of SSR in men with azoospermia which reports model performance (both calibration and discrimination) and model validation [14]. Similarly, this model demonstrated that higher male age and lower male FSH are predictive of success of SSR [14].
Unlike in previous studies, we chose the starting point for our PM for live birth for couples undergoing TESE- ICSI as prior to ovarian stimulation, using parameters in our prognostic model that are known before the start of treatment. We found that live birth was associated with lower maternal age, testicular volume <15ml and a lower TESE-ICSI cycle number (<3). After model development using multivariable logistic regression analysis, we found a good c-statistic of 0.70, with adequate internal calibration but with less certainty at the upper predictive range. Nevertheless, model calibration suggested that the model would still be clinically useful. The partners of men with azoospermia may have no identifiable subfertility and consequently represent a different patient group. Existing ART PMs may not be applicable to couples with subfertility due to azoospermia. Other PMs in the literature for live birth with TESE-ICSI also report that lower female age was a key predictive factor in its success [36,40]. Previous meta-analyses have demonstrated that increasing female age is predictive of lower pregnancy chances after IVF/ICSI in unselected populations of subfertile women [42]. As expected, this important predictor was relevant in the context of azoospermia as is a key marker of ovarian reserve.
A previously published PM based upon the Human Fertilisation and Embryology Association (HFEA) data demonstrated that increasing numbers of previously unsuccessful IVF cycles was associated with a lower chance of live birth [43]. Our data demonstrating a reduced chance of live birth after three of more cycles supports this.
An interesting finding from our study was that testicular volume <15ml was predictive for live birth, the biological explanation of which is unclear. This data suggests that azoospermic men with a slightly lower testicular volume may, once the obstacle of retrieval is overcome, have sperm more likely to achieve subsequent live birth. It could be that the cut-off used to define lower testicle volume in our study is too broad. We also hypothesise that topographical variations in testicular pathology independent of testicular volume can occur, in men with slightly smaller testes, it could be that there is a larger percentage of the overall testis is biopsied and therefore more chances of finding sperm.
Although externally unvalidated, we have taken steps to facilitate generalisability, specifying parsimonious, theoretically driven models based on current literature and on broadly available predictors. As patient characteristics alone cannot fully account for the complexity of prediction in reproductive health, centre-specific models enable more locally relevant prediction. Prediction is an ongoing process and for male factor subfertility, we recommend further development and validation of models including the refitting of models for local use.
Our study was also limited to routinely available candidate predictors. Consideration of candidate variables involves a balance between what is clinically available and therefore clinically relevant in practice and what may have predictive value but is not clinically relevant in practice at the time of the study. We did not routinely obtain inhibin B concentration, which is known to be indicative of the number of Sertoli cells [44]. This has been studied in explanatory models, with a higher serum inhibin B concentration being associated with increased SSR success [9,32,45,46]. Whilst inhibin concentration may prove a useful counselling tool, its absence from our routine practice precluded its inclusion in this analysis and would limit its use in a PM elsewhere. Many studies aimed at identifying predictors of successful SSR utilise histological diagnosis of testicular biopsies [6, 8-10, 12, 15-18, 21, 24, 26, 27, 32-35], although this is not available pre-operatively, it can provide useful information for counselling couples after SSR about chances of further success if needed. A limitation of our study is that we do not have histological diagnoses available for all of our cohort so this parameter could not be considered for use. Some of the data were collected retrospectively; however, we do not believe this introduced bias as measurement of the variables was performed according to the same clinical protocols and by the same clinical team in this single unit over the study period, and variables that were not, were not considered. The eligible population included all couples in which a diagnosis of azoospermia was made and underwent conventional TESE in which there was material uncertainty in the chance of successful retrieval. As the outcome is clearly naturally time-ordered, again we do not believe the chronology of data collection introduced bias. We defined successful SSR as the presence of motile sperm which survived the freeze-thaw process. During the study, all patients had TESE samples frozen and checked for motile sperm which survived the freezethaw process as per the unit policy. This is to ensure that couples are not exposed to the risks of ovarian stimulation prior to the knowledge that sperm has been retrieved. However, this may limit the generalisability of our findings as we are aware that other units coordinate ovarian stimulation with SSR and consequently may include patients where rare sperm are found in some fresh TESE procedures. We believe our TESE success PM offers good performance based on a pragmatic range of predictors. The live birth PM was limited by the lack of data known on preoperative predictors of outcomes for IVF/ICSI including female aetiology, duration of subfertility, type of subfertility and markers of ovarian reserve [42]. Although our model offers insight into prediction in the azoospermic population which is an area of need, future routine data could be incorporated to improve the theoretical basis of the model.

Conclusions
We developed internally validated PMs for success of TESE and for live birth following TESE-ICSI in azoospermic couples demonstrating that success can be predicted using preoperative parameters. These models indicate the prognostic value of routine predictors to guide local counselling of this patient group. They provide insight into a more personalised prediction for azoospermic couples, and we recommend future research into the development and validation of locally applicable models and research into their implementation.