How V. Talanov’s Diagnostic Questionnaires Are Made

1. At the very first stage, people with known socionic types (i.e., types they have stated, provided they are highly confident in their type and have extensive socionic experience) answered questionnaire questions. For each question, the answers were averaged within each of the declared type groups out of 16. As a result, a type profile (consisting of 16 numbers) is obtained for each questionnaire item. It is then easy to convert this into a trait profile – and this will be even more reliable, because if the average for a single socionic type was obtained, say, from the responses of 100 test subjects with that declared type, then in the calculation of trait weights, not 100 but 1600 people (representatives of all types) participate. Therefore, trait values initially have less error than type ones.
Moreover, to increase the reliability of the results, each declared type is not taken into account with equal weight for all respondents who declared that type, but with varying weight – this weight depends on a complex empirically derived formula based on the respondent’s socionic experience, their confidence in their type, and also on the actual at least approximate match with the type already predicted by the questionnaire (based on the diagnostic coefficients of those questions not surveyed in this questionnaire but obtained earlier in previous studies).
Moreover, those declared types that fall not even into the positive, but into the negative diagnostic zone of the questionnaire for that type (i.e., clearly stated arbitrarily – about 10% of such cases), automatically receive a weight of zero and are not considered – as clearly erroneous declarations.

2. At the second stage, the entire database of questionnaires with their answers is recalculated based on the diagnostic coefficients derived from the trait profiles calculated in stage 1 according to the declared types. To diagnose the value of each of the 15 socionic traits in a respondent, the correlation of their answers with the diagnostic coefficients for this trait in the corresponding questions is calculated. As a result, a trait profile of the subject is obtained, represented by correlation coefficients.
These correlations have a distribution different from the normal (Gaussian) one. But for the convenience of further mathematical processing, we prefer them to follow a normal distribution. This is achieved by applying the so-called Fisher transformation to all correlations – it normalizes the statistical distribution of any correlations.
Since all respondents now have specific diagnosed values for the 15 Reinin traits (i.e., have their own trait profile, from which their type and functional profiles are also derived), direct correlations can be measured between their answers to any questionnaire question and their 15 Reinin trait values.
Thus, each questionnaire item is assigned a newly formed second-approximation trait profile (again in the form of Fisher-transformed correlations).

3. A single questionnaire item can be used in several different questionnaires. Therefore, the diagnostic trait coefficients obtained for that item in each individual questionnaire are then aggregated and averaged – with a weight proportional to the number of respondents in each questionnaire.

4. If, after the previous step, we examine the intercorrelation matrix of the trait loadings of the 12,000 questions for which we built trait profiles, we will see that it contains many excessively large absolute-value correlations between different traits.
Small correlations can be quite justified. For example, intuition, ethics, and questimity are expected to correlate slightly with each other – they all tend toward the right hemisphere.
But among the observed correlations in the matrix, there are many overly large or even strange and inappropriate ones. Clearly parasitic.

What can cause parasitic correlations? For example, systematic “skewness” in the socionic types declared by respondents in the training sample. One reason is that some socionic types are trendier than others.
Another potential cause of parasitic correlations is sample heterogeneity in terms of the number of represented types. Although this heterogeneity is mathematically corrected, it’s not perfect.
If, for instance, our sample contains many LSI types and few LII (purely for example), then, given the continuous transitions between psychotypes, those diagnosed as LII within their “native Robespierre zone” will show a bias toward traits of LSI.
This is the so-called statistical skew effect due to significant type heterogeneity in the sample.
In practice, for instance, representatives of the LSE type are quite rare in online questionnaire samples, while SLE and LSI are more common.
So, in the group of LSE who did make it into the sample, there will inevitably be a shift toward traits of beta sensory types.
Similarly, the rare ESE group in online samples is “contaminated” with traits of ESI, SEE, and EIE (artificially boosting their decisiveness), and the rare LIE – with traits of SLE and ILE, and so on.
These statistical skews, in addition to the varying “trendiness” of declared types, are another serious reason for the appearance of parasitic non-zero intercorrelations between traits in the sample.

But it turns out that all these “statistical diseases” can be dealt with, even without delving deeply into their diagnostics or using customized treatments (though those also exist and can be used if desired).

The point is that, in addition to many specific and not always reliable methods, there is also a universal method of general artifact correction, and this is what we use:

A large sample of trait profiles from respondents of many questionnaires is taken (e.g., from 15,000 respondents). The sample is mathematically adjusted for unequal representation of different TIMs (as much as mathematically possible – all trait profiles are multiplied by the square root of the TIM’s representation ratio in the sample; this is a known and valid method for correcting sample inhomogeneity).
This correction does reduce part of the parasitic correlations in the intercorrelation matrix of traits in this sample, but it still doesn’t lead to high-quality orthogonalization of traits.
The resulting matrix is still far from orthogonality and closely resembles a similar matrix built from question trait profiles instead of respondent profiles (the coefficients of these two matrices are closely correlated).

Therefore, if we correctly orthogonalize the matrix obtained from respondent profiles, we can derive the orthogonalizing transformation of traits, which can also be applied to previously obtained trait loadings of questionnaire items.
An orthogonalizing transformation is when, for example, a new vertness trait is expressed as a sum of all 15 previous Reinin traits with some weight coefficients – the coefficient for the old vertness is the largest, but the non-zero coefficients for the other 14 traits slightly rotate the axis of vertness by a small angle.
Finding the needed orthogonalizing transformation means finding all these small additional coefficients.

The entire challenge lies in what it means to “ORTHOGONALIZE THE MATRIX PROPERLY.”
The transformation is not unique. Some versions reduce the agreement between questionnaire diagnoses and declared types, while others preserve or even increase this agreement.
Therefore, the orthogonalizing transformation is found under the control of several special criteria.

What are these criteria?
FIRST, minimizing all axis rotations (this means that the correlation of new profiles with the old ones should be as high as possible).
SECOND, for some traits, residual correlation is acceptable if it is theoretically justified.
THIRD (and most importantly), the criterion of maximizing the match between declared types and final type diagnoses (resulting from the trait profiles after transformation) is set.

The transformation itself is a 15x15 matrix, in which each new trait is a linear combination of the previous traits with fixed coefficients (these 225 coefficients form the matrix).
Selecting these coefficients optimally while meeting all control criteria can only be done by a neural network. Excel has a built-in neural network, which is used.
The trick lies in selecting a proper unified criterion to guide the neural network and integrate all the control conditions. This criterion is our know-how; developing it took several months.

The rest is done by the neural network over several work cycles, and thankfully, this only has to be done once for the entire database, not for each new questionnaire.

What do we get in the end? A linear transformation of all traits, largely orthogonalizing them, maintaining high correlations with the old traits (average correlation above 0.95), and, most importantly, preserving the high agreement between questionnaire diagnoses and declared types.
Next, we can either recalculate all question trait loadings using the corrected respondent trait profiles or, more easily, directly apply the same orthogonalizing transformation to the previously obtained question trait profiles. The result is almost identical either way.

5. As a result, we get a library of trait loadings for 12,000 questionnaire items, practically free from artifact influence.
Now we need to form a questionnaire from these items.
First comes the selection of questions for the diagnostic questionnaire. The selection criteria are:
First, there must be a roughly equal number of questions with maximum loadings on each of the 15 traits and on each of their individual poles.
Second, the intercorrelation matrix of trait loadings of all selected questions should be as orthogonal as possible – the sum of squares of its elements should be as close to zero as possible.
Third, among the questions for each trait, there should be approximately equal representation of items with positive and negative contributions to social dissimulation (this is important for the adequate later correction of the test results for the social dissimulation factor – essentially a factor of self-esteem level and self-criticism).
This selection is done using a special support program that enables semi-automatic processing of the database using an integral selection and filtering criterion.

6. The previous stage alone is not enough to form a high-quality questionnaire.
First, the intercorrelation matrix of trait loadings of the selected items is still far from ideal (perfect selection is not possible – nor necessary, as we’ll see).
Second, we need the questionnaire to compute correlations between respondent answers and our pre-obtained trait and type loadings.
Respondent answers are normally distributed. But calculated correlations between responses and loadings will be comparable only if the loadings are also normally or quasi-normally distributed – i.e., the kurtosis of the distributions may not be zero but must be the same across all 16 type-loading sets.
If kurtoses differ, some types (with negative kurtosis) will stand out and yield disproportionately high correlation peaks, while others (with positive kurtosis) will be underestimated.
In short – we must equalize the kurtosis of all type loadings, or we’ll again get artifact-induced agreement loss.
This task is again handled by a special program using Excel’s neural network.
Additional weight coefficients ranging from 0.85 to 1.15 are assigned to all items so that the final intercorrelation matrix becomes even more orthogonal and the kurtoses across all 16 sets of type loadings become equalized.

7. Think that’s it? No – there’s one final stage in the mathematical preparation of the questionnaire.
Again, a neural network (this time anew for each questionnaire) searches for an additional orthogonalizing transformation of trait axes of question loadings, under a guiding criterion that includes two requirements:
First – minimal additional axis rotations.
Second – the final intercorrelation matrix of the trait loadings of all our questions must become identical to the one obtained on the trait loadings of the respondents after the optimizing and orthogonalizing transformation of their traits (see result of step 4).
Only after completing this last step is any new questionnaire with all its diagnostic coefficients finally ready for use.

The actual processing of your answers and generation of the final report is done without any neural networks – it is simply based on the correlations between your answers (slightly adjusted by corrective factors) and the precomputed diagnostic coefficients for those questions. Namely, the trait and type diagnostic coefficients, previously calculated by us for each questionnaire item using the multi-stage combined mathematical methods described above.