Word maps with constants on symmetric groups

We study word maps with constants on symmetric groups. Even though there are mixed identities of bounded length that are valid for all symmetric groups, we show that no such identities hold in a metric sense. Moreover, we prove that word maps with constants and non-trivial content that are short enough have an image of positive diameter only depending on the length of the word. Finally, we also show that every self-map $G \to G$ on a finite non-abelian simple group is actually a word map with constants from $G$.


Introduction
Recently, there has been increasing interest in word maps and laws on finite, algebraic, and topological groups [2,3,[6][7][8][9][10][11][12]14,16,[18][19][20][21][22]24,29,32].Here, every word w ∈ F r = x 1 , . . ., x r induces a word map w : G r → G on every group G by substitution, where F r denotes the free group in the r generators x 1 , . . ., x r and we set w(g 1 , . . ., g r ) to be the image of w under the unique homomorphism F r → G which maps x i → g i (i = 1, . . ., r).The word w is called an identity or a law for G iff w(g 1 , . . ., g r ) = 1 G for all choices of the g i ∈ G (i = 1, . . ., r).It is an interesting question to study the length of the shortest non-trivial law of a given finite group G.This was done in [4,30].A bit less restrictively, one can ask when the image of a word map is small in a metric sense, see for example [18,24,31].
In this article we study word maps with constants.A word with constants in G is an element of the free product F r * G.We get an associated map G n → G in a similar way as before by replacing the variables with elements from G. The word is called a mixed identity or a law with constants iff w(g 1 , . . ., g r ) = 1 G for all choices of the g i ∈ G (i = 1, . . ., r).
Following [16,27], for w ∈ F r * G, we study the augmentation ε(w) ∈ F r , which replaces all constants by the neutral element.We call ε(w) ∈ F r the content of w.It has been observed in different circumstances [16,27] that word maps with constants which have a non-trivial content, tend to have a large image.We prove a corresponding result for symmetric groups with respect to the normalized Hamming metric.For σ ∈ S n , we write We say that w ∈ F r * G is strong if, when written in normal form, removing the constants does not lead to any cancellation among the variables -in particular, if w is strong, then either the content is non-trivial or the word was a constant from G.
There are some obvious mixed identities for S n .E.g. take τ a transposition, then [x, τ ] 6 is such an identity, since [g, τ ] = τ −g τ is the product of two transpositions and so either a 3-cycle, the product of two disjoint transpositions, or trivial, where g ∈ S n is arbitrary.We notice here that the word with constants [x, τ ] 6 has a trivial content.It is one consequence of our main result, Theorem 1, that this must be the case for any mixed identity on S n of small length.However, there are long words with non-trivial content that are an identity for S n , e.g.take x n! .We show that a non-trivial word map with constants coming from a short word can have a small image (i.e. with small diameter) only if there are constants involved with small support.
implies that c j = 1 G for j = 1, . . ., l − 1.Then, we set ℓ(w) = l and call it the length of w.Note that ℓ is just the word length where we give elements from G length zero.
Our main result is the following theorem.
Theorem 1.Let w ∈ F r * S n and consider the associated word map with constants w : S r n → S n .Then the following holds: .
More precisely, if w / ∈ S n is arbitrary and the inequality in (ii) is violated, then a constant of size at most 2(diam(w(S r n )) + 1)ℓ(w) can be found, so that the removal of that constant leads to cancellation of variables.
Corollary 1. Assume that w ∈ F r * S n (n ≥ 2) is strong and induces a constant map S r n → S n .Then either w ∈ S n or ℓ(w) ≥ n 2 .The shortest known strong mixed identities are the laws constructed in [17], they are of length exp(C log(n) 4 log log(n)).It is an interesting open problem to close this gap.
The second part of Theorem 1 implies the following corollary which is of independent interest.Corollary 2. A metric ultraproduct of symmetric groups S n equipped with the normalized Hamming metric, with n tending to infinity along the chosen ultrafilter, does not satisfy any non-trivial mixed identity.
For completeness, let us briefly recall the notion of the metric ultraproduct of a family of metric groups (G n , d n ) n∈N with respect to an ultrafilter U on N.Here d n is a bi-invariant metric on G n (n ∈ N).It is defined as the quotient n∈N G n N U , where is the normal subgroup of null sequences, and is itself a metric group when equipped with the limit metric.
For more on metric ultraproducts of symmetric groups, see [5,26,28].Note that the previous corollary is in contrast to the usual ultraproduct of symmetric groups, which satisfies a mixed identity [x, τ ] 6 , where τ is an ultraproduct of transpositions as discussed above.
Despite the restrictions on images of word maps with constants given by the length of the words and the structure of the constants involved, we prove in the Appendix that every self-map of a non-abelian simple group is a word map with constants.

Basic definitions
In this short section, we make some basic definitions, which are needed later.Let G be a group, the group of possible constants.As above, write the fixed word w ∈ F r * G uniquely in the form where ε(j) = ±1, i(j) ∈ {1, . . ., r} (j = 1, . . ., l), and c j ∈ G (j = 0, . . ., l) are such that there is no cancellation.
We define the sets of indices J 0 (w), J + (w), J − (w) ⊆ {1, . . ., l − 1} by Note that J 0 (w), J + (w), J − (w) partition the set {1, . . ., l−1}.We call J − (w) the set of critical indices of w since the removal of a constant c j for j ∈ J − (w) leads to cancellation among the variables.The constant c j is then called a critical constant.We call v an elementary reduction of w ∈ F r * G, if it is obtained from w by deleting a critical constant and reducing the outcome.Now, we focus on word maps with constants in symmetric groups.Keep the notation from above and set G := S n .For simplicity, we assume that c 0 = 1 G = id.Keep the definition of the sets J − (w), J + (w), and J 0 (w).Define i(j) c j to be the jth prefix of w (j = 0, . . ., l).Note that w l = w by construction.For a reduced word v ∈ F r * S n , write |v| i for its i-length, i.e. the number of occurrences of x i and x −1 i in its reduced representation (i = 1, . . ., r).Let w(S r n ) ⊆ S n denote the image of the word map w : S r n → S n induced by w.

Proof of the main result
We are now ready to state the main technical lemma: Lemma 1.In this setting, assume that l > 0 and let d ≥ 1 be an integer such that the following hold: Then, we have that diam(w(S r n )) ≥ d.
To prove this lemma and the results thereafter, we need some additional graph-theoretical terminology: By a directed graph G, we mean an object that consists of a set of vertices V (G) and a set of edges E(G) together with maps α +1 , α −1 : E(G) → V (G).For an edge e of G, we call α +1 (e) its source and α −1 (e) its target; also for ε ∈ {±1}, we call α ε (e) the ε-source and α −ε (e) the ε-target of e.Let S be a set of labels.By an edge-labeling of G by S we mean a mapping λ : E(G) → S. If G is equipped with such a labeling, we call it a directed S-edge-labeled graph.In this case, an edge e of G which is labeled by s ∈ S is called an s-arrow.In this situation, if we exhibit a particular vertex v of G, we call (G, v) a pointed directed S-edge-labeled graph.
We will also make use of the following notation for partially defined maps: If a : A → B and b : B → C are such maps, then their composition ab : A → C is the partially defined map given by x.ab = (x.a).b if x.a exists and (x.a).b exists, and is undefined otherwise.
Here comes the proof of the previous lemma: Proof of Lemma 1.We have to prove that, under the assumptions of the lemma, there exist two elements σ, τ ∈ w(S r n ) such that d(σ, τ ) ≥ d.So let τ ∈ w(S r n ) be an arbitrary element.In the following, we construct σ ∈ w(S r n ) such that d(σ, τ ) ≥ d: Let S n act on the n-element set Ω and fix d arbitrary distinct points ω 1 , . . ., ω d ∈ Ω (this is certainly possible by condition (iv) in the lemma, since n ≥ d|w| i(l) + 1 > d as l ≥ 1).Now we inductively construct a family (G j k , ω j k ) j,k of pointed directed S-edge-labeled graphs on the fixed vertex set V (G j k ) := Ω where S = {x 1 , . . ., x r } is the set of generators of F r (j = 0, . . ., l, k = 1, . . ., d).The graphs G j k will be constructed in the following lexicographic order: Each such graph G j k will be a partial Schreier graph of F r (i.e. a graph that can be completed to a Schreier graph of F r ) which is obtained from its predecessor by adding a single x i(j) -arrow e j k , starting from the empty graph G 0 1 , i.e. the graph with E(G 0 1 ) = ∅.Therefore, there will be precisely ((k − 1)|w| i + |w j | i )-many x i -arrows in G j k which encode partial injective maps π j k,i : Ω → Ω (i = 1, . . ., r).With this notation, the points ω j k will be chosen in such a way that ω j k = ω k .wj (π j k,1 , . . ., π j k,r ) and ω k .wl (π l k,1 , . . ., π l k,r ) = ω k .τfor j = 0, . . ., l and k = 1, . . ., d, i.e. if j ≥ 1 we have Hence, setting π 1 , . . ., π r ∈ S n to be extensions of π l d,1 , . . ., π l d,r , the points ω j k (j = 0, . . ., l) will be precisely the trajectory of ω k under the prefixes of w(π 1 , . . ., π r ).Thus, setting σ := w(π 1 , . . ., π r ), we will have that ω k .σ= ω k .τfor all k = 1, . . ., d, so that d(σ, τ ) ≥ d as desired.
We are left to carry out the construction of the family (G j k , ω j k ) j,k of pointed S-edge-labeled graphs.Recall that we start with G 0 1 being the empty graph, hence E(G 0 1 ) = ∅ and ω 0 1 := ω 1 , . . ., ω 0 d := ω d .Assume that we are to construct (G j k , ω j k ) out of the previous data.If j = 0, there is nothing to do.So assume j ≥ 1 and we are to add the x i(j) -arrow e j k .We assume by induction that we are already given an admissible 'starting point' α ε(j) (e j k ) = ω j−1 k of our edge e j k , i.e. ω j−1 k = α ε(j) (e) for any x i(j) -arrow e ∈ E(G j−1 k ).Our task is to find an admissible 'end point' α −ε(j) (e j k ) of e j k .This means, we have to ensure that the following conditions are satisfied: Let us briefly explain this: (a) means that e j k will not have the same ε(j)target as any x i(j) -arrow in G j−1 k , ensuring that G j k remains a partial Schreier graph.(b) is necessary, since the ε(1)-sources of the x i(1) -arrows e 1 m are already fixed to be ω m from the beginning (for all m = 1, . . ., d), so these cannot be used by another x i(1) -arrow as its ε(1)-source.(c) is necessary to ensure that e j+1 k will have a valid 'starting point' ω j k = α ε(j+1) (e j+1 k ) (when j < l) which is not already in use as the ε(j + 1)-source of another x i(j+1) -arrow in G j k .Finally, (d) ensures that ω k .σ= ω k .τ .Now we count the number of possibilities to choose the ε(j)-target α −ε(j) (e j k ) of e j k according to (a)-(d): As G j−1 k is a partial Schreier graph, no two of its x i(j) -arrows have the same ε(j)-target.Hence there are precisely possible ε(j)-targets for e j k that satisfy (a).In the following three cases, we assume that j < l, so that (d) is irrelevant.Case (i): i(j) = i(j + 1).This means that j ∈ J 0 (w).Then G j k has precisely x i(j+1) -arrows, none of which is e j k , so (c) rules out as many vertices for the ε(j)-target of e j k .In the worst case, the condition in (b) is satisfied, so that (b) rules out at most d − k more vertices.In total, we get at least possible choices for the ε(j)-target of e j k , which is a positive number by assumption (i) in the lemma.Case (ii): i(j) = i(j+1) and ε(j) = ε(j+1).This means that j ∈ J + (w).Considering (c), there are precisely -arrows e different from e j k in G j k (namely the ones in G j−1 k ) which admit as many vertices as their ε(j +1) = ε(j)-sources, and the condition α −ε(j) (e j k ).c j = α ε(j) (e j k ) rules out the vertex α ε(j) (e j k ).c −1 j = ω j−1 k .c−1 j .In the worst case, the condition in (b) is satisfied, so that (b) rules out at most d − k further vertices.In total, we get at least , which is positive by assumption (ii) in the lemma.
Case (iii): i(j) = i(j + 1) and ε(j) = −ε(j + 1).This means that j ∈ J − (w).As in the previous case, (c) splits into two parts: The x i(j+1) = x i(j) -arrows different from e j k in G j k rule out vertices for the ε(j)-target of e j k , and the condition α −ε(j) (e j k ).c j = α −ε(j) (e j k ) rules out the n − |c j | fixed points of c j .In the worst case, the condition in (b) is satisfied, and (b) rules out d − k further vertices.Hence we have at least , which is positive by assumption (iii) in the lemma.Case (iv): j = l.Then (c) is irrelevant, (b) can rule out at most d − k vertices for α −ε l (e l k ) if ( * ) holds, and (d) rules out one further vertex.Hence in total we have at least vertices remaining.This is a positive number by assumption (iv) of the lemma.Hence the proof is complete.The four assumptions in Lemma 1 can obviously by strengthened to the single inequality Note that |w| ∞ is obviously bounded by the length ℓ(w).We will use the above inequality in the form We interpret this inequality as follows: If the diameter of the word image is small, then either there is a small critical constant or the word length is large.
Remark 2. We suppose that a similar lemma can be proved for families like PSL n (p) for a fixed prime p and n growing, leading to results comparable to our main theorem.However, we note that for example for the family PSL 2 (q) for q growing, the length of the shortest mixed identity grows linearly in q.Note that in this case, the fact that the length of the shortest mixed identity must tend to infinity is already a consequence of the results in [9].This will be the subject of further work.
Proof of Theorem 1.The bound in (ii) directly follows from the inequality |w| crit ≤ 2(diam(w(S r n )) + 1)ℓ(w), since when w is strong we have |w| crit = n as there is no critical constant.The assertion about existence of small critical constants is also an immediate consequence of the inequality above.
In order to prove (i) we study the effect of removal of critical constants in more detail.Let v be an elementary reduction of w ∈ F r * G, i.e. it is obtained from w by deleting a critical constant and reducing the outcome.Note that removing the smallest critical constant c does not change the diameter of the word image by more than 2|c| = 2|w| crit , so that given a small diameter, we can try to iterate elementary reductions.Let w have non-trivial content and let w = w 0 , . . ., w m be a chain of words, where w i is an elementary reduction of w i−1 (i ≥ 1) by a smallest critical constant, so that there is no such reduction for w m , i.e. w m is strong.Note that w i no longer denotes the ith prefix of w.Then ℓ(w Hence, we obtain by induction on i = 0, . . ., m, since ℓ(w i ) ≤ ℓ(w).By assumption, w m is not a constant from G = S n .Then by the above inequality as m ≤ ℓ(w)/2 and ℓ(w) ≥ 1.This is the bound in (i) of Theorem 1.
We now turn to metric ultraproducts of symmetric groups and Corollary 2. Assume that l ≥ 1 and let S n be a sequence of reduced words of the same structure, but with different constants c j,n (j = 1, . . ., l, n ∈ N, n ≥ 2).Let U be an ultrafilter on N such that diam(w n (S r n ))/n → U 0 as n → U ∞, i.e.
) n induces a constant map on the metric ultraproduct of the S n 's equipped with the normalized Hamming norm |g| H = |g|/n.We claim, that then there is an index j ∈ {1, . . ., l − 1} such that i(j) = i(j + 1) and ε(j) = −ε(j + 1), such that |c j,n |/n → U 0 as n → U ∞.In particular, w is trivial.
Proof of Corollary 2. By the above 2|w| ∞ (diam(w n (S r n )) + 1) ≥ |w| c , so that |w| c /n → U 0 as n → U ∞ by the assumption.Hence, by finiteness, there exists an index j ∈ J − (w) such that |c j,n |/n → U 0 as n → U ∞.In particular, starting with a potential mixed identity in the metric ultraproduct, the previous argument implies that a critical constant must be trivial.This is a contradiction and the proof is complete.

Appendix
It is well-known that every map F q → F q is realized by a polynomial from F q [X] of degree less than q -in fact this is a characterization of fields among finite rings, see [15].We prove the analogous result for non-abelian simple groups.We denote by the covering number cn(G) of G the minimal m, such that C * m = G for all non-trivial conjugacy classes C. Various bounds on cn(G) for finite simple groups G can be found in the literature, see for example the seminal work of Liebeck-Shalev [23]   Proof.The evaluation map ev : h} is the projection onto g and h, then π {g,h} • ev is surjective: Indeed, by Goursat's lemma and since G is assumed to be simple, it is enough to show that its image is not the graph of an automorphism.However, this cannot be the case, since the constants G map to the diagonal subgroup, and x maps to a non-diagonal subgroup.This implies that ev must be surjective too.
By using iterated commutators, one can give a more explicit construction: We prove that for any subset S ⊆ G of size m ≤ 2 e and an element g ∈ G \ S, there is a commutator word w g,S ∈ F 1 * G of length 4 e such that the induced map w g,S : G → G satisfies w g,S (s) = 1 G for all s ∈ S and w g,S (g) = 1 G .
Then setting e := ⌈log 2 (|G| − 1)⌉, one can take S = G \ {g}, so that w g,S (s) = 1 G for s ∈ S and w g,S (g) = 1 G as desired.Now we can multiply together conjugates of the map w g,S to get a map δ g,h ∈ F 1 * G such that δ g,h (s) = 1 G for s ∈ S = G \ {g} and δ g,h (g) = h for a chosen h ∈ G. Then w = g∈G δ g,f (g) = f as maps w, f : G → G where f is arbitrary.So f is a word map.Here Conversely, if any map f : G → G is a word map, then for g = 1 G , the map δ g,h : G → G given by is induced by some word w ∈ F 1 * G. Since w(1 G ) = δ g,h (1 G ) = 1 G , we have that w(g) = δ g,h (g) = h ∈ g .But since h was arbitrary, we have g = G, so that G is simple, since g = 1 G was arbitrary as well.
By [1], Chapter 3, we have cn(A n ) = ⌊n/2⌋ for n ≥ 6.Thus, we get the following estimate for alternating groups.
Corollary 3.For n ≥ 5, every map A n → A n can be represented by a word map with constants of length O((n!) 3 n).
Remark 5.There are at most 3 l |G| l+1 words with constants of length at most l in F 1 * G = x * G.So in order that all self-maps G → G can be represented as word maps coming from words of length at most l, we must have This implies a linear lower bound on the minimal length of words with constants that can represent all possible maps.Remark 6.It was pointed out to us by Ben Steinberg and Anton Klyachko, that the qualitative aspect of Theorem 2 is a well-known result [25].Similar but weaker bounds have been obtained in [13].Using methods similar to the ones in the proof of Theorem 2 one can show that every map G r → G arises as a word map with constants of length O(|G| r+2 r 2 cn(G)).Indeed, one can use the maps of length at most O(|G| 2 ) to get maps that see just one coordinate of G r .Taking ⌈log 2 (r)⌉ commutators of those maps, one gets a map that sees just one element of G r .This then has length O(r 2 |G| 2 ).Hence, we get a bound O(|G| r+2 r 2 cn(G)).
This is exactly the bound in [13] for A n , but the authors do not write it out for other non-abelian simple groups.

Remark 4 .
In the theorem, the term cn(G) can also be replaced by the covering diameter cd(G) of G, which is the minimal number m such that ({1 G }∪C ∪C −1 ) * m = G for all non-trivial conjugacy classes C of G. Clearly, cd(G) ≤ cn(G).
and the references therein.Theorem 2. A non-abelian finite group is simple if and only if every map G → G is a word map with constants.Every such map can be represented by a word of length at most O(|G| 3 cn(G)).3.An abelian group G has the above property (i.e. each map G → G comes from a word with constants) if and only if |G| ≤ 2.