On complementizers and embedded gapping in English, Spanish and Polish

This paper examines two sequences which display gapping under two different embedding configurations in English, Spanish and Polish. I claim that the different distribution of the finite complementizer in these configurations and across these three languages provides further evidence for the idea that gapping is not a uniform phenomenon, and that different structures may correlate with different heights at which coordination can take place in gapping.


Introduction
Gapping is a phenomenon in which the verb in the rightmost conjunct of a sentence coordination structure is elided under identity with the verb in the leftmost conjunct (1a), which I will refer to as the antecedent clause. Examples (1b) and (1c) show that ellipsis may target elements other than the main verb, like complements or adjuncts, even if these elements do not appear to conform a constituent (1c): (1) a. Linda studies psychology, and her brother studies biology.
b. I will travel to Sri Lanka in the summer, and my neighbour will travel to Sri Lanka in autumn. c. I will travel to Sri Lanka in the summer, and my neighbour will travel to Israel in the summer.
For the purposes of this paper, the examples in (1) will be referred to as canonical gapping, which can be defined as gapping occurring in matrix clauses. 1 Very broadly speaking, the various existing analysis of canonical gapping differ along two main questions: (i) what formal mechanism is responsible for the gap in the second conjunct?; and (ii) at what height does coordination take place in gapping? With respect to the first question, I will assume that ellipsis involves deletion of syntactic material at PF (i.a. Sag 1976). Following standard practice, I represent elided material in strikethrough text, as illustrated in (1). With respect to the second question, two main analyses have been put forth, which are typically referred to as low and high coordination accounts. Low coordination analyses (Coppock 2001, Lin 2002, Johnson 2009 posit that coordination in gapping holds at the level of the VP. Under these accounts, the example in (1) would receive the structure in (2). For simplicity reasons, I will represent coordination using non-binary branching, see Zhang (2010)  Alternatively, under high coordination accounts (Neijt 1979, Hartmann 2000, Reich 2006, inter alia), canonical gapping involves coordination of two CPs. Compare (2) to (3): (3) CoordP

CP and CP
Linda studies psychology her brother studies biology 2 The representation in (2) is not without its problems. For example, it is unclear why extraction of the preverbal subject from the leftmost VP does not violate the Coordinate Structure Constraint, or how the subject is licensed in the second conjunct; see Johnson (2009) for discussion.

vb vb vb
One issue with the representations in (2) and (3) is that the PF deletion operation they display appears to target non-constituents (see the discussion on Fernández-Sánchez 2020: chap.4). To avoid this, it is customary to assume that remnants, i.e. the elements that survive ellipsis -in (1) those would be the DP her brother and the NP biology -undergo movement to the left edge of the ellipsis domain. Therefore, as an illustration, the rightmost CP in (3)  It is interesting to see that canonical gapping is a priori compatible with both low and high coordination structures. In this short paper, I focus on two non-canonical gapping configurations in three languages, namely English, Spanish and Polish. These configurations involve gapping in subordination contexts: Non-Canonical Gapping 1 (NCG1) displays an asymmetric coordination structure, where the clause containing the gap is not directly coordinated with its antecedent: Clause containing the gap Such cases have been argued to be ungrammatical in English (Hankamer 1979), but they have been reported to be fine in Spanish and Polish (Fernández-Sánchez 2016), as well as in English (Wurmbrand 2017) and in other languages like Farsi (Farudi 2013), Georgian and Russian (Erschler 2016). I address NCG1 in section 2. Note that the structure in (5), as opposed to cases of canonical gapping, is absolutely incompatible with a low coordination structure, and must be given a high/ clausal coordination analysis. In turn, Non-Canonical Gapping 2 (NCG2) involves cases where the clause containing the gap is directly coordinated with its antecedent clause, just like in canonical gapping (cf. 1); however, in this case, the entire coordination is embedded under one main verb. NCG2 is illustrated in (6) (6) is indeed compatible theoretically with both a high and a low coordination account, we will see in section 3 that there are reasons to believe that NCG2 involves a low coordination structure, which means that the representation in (6) will not be entirely accurate. The claims in this paper suggest that gapping is therefore not a unified phenomenon, a conclusion which goes in line with previous research (Repp 2009, Centeno 2011.
Before concluding the paper, in section 4 I will tentatively address the syntax of an understudied gapping string which I will take to be a run-of-the-mill case of NCG1 where the embedding predicate is in turn gapped.

The No Embedded Constraint
Hankamer (1979)  Note that data like (8) or (9) can only be accounted for under a high coordination analysis with clausal ellipsis applying in the embedded clause. 5 The question is: why would English be different from these languages? Is this a typological split? It is important to mention, however, that English is not that different from Spanish or Polish, despite Hankamer's initial observation: structures like (7a) are possible provided that, as observed by Wurmbrand (2017), no complementizer precedes the remnants: (10) Alfonse stole the emeralds and I think Mugsy the pearls. 6 5 Of course this does not mean that gapping in these languages must always involve high coordination structures. As an anonymous reviewer mentioned, various authors have developed eclectic accounts of gapping where both high and low coordinations are involved in different gapping strings within the same language; see Repp (2009), Centeno (2011 or Wong (2016); The main claim in this paper is, precisely, that the two configurations under scrutiny here must involve different coordination heights. 6 An anonymous reviewer wonders whether this is truly a case of embedding, or whether (10) involves a run-of-the-mill gapping structure where the antecedent clause and the clause containing the gap are directly coordinated and the sequence "I think" is a parenthetical comment clause (Schneider 2007, Griffiths 2013 which provides an epistemic/evidential qualification over a proposition. First, the equi-valent structures in Polish and Spanish In order to capture the data, Wurmbrand proposes the following condition: (11) The Embedded Gapping Constraint Gapping of embedded clauses is only possible when the embedded clause lacks a CP.
To explain the ungrammaticality of (7a) and the grammaticality of (10), she makes the following assumptions: first, she argues -in line with others (Gallego 2009, Bošković 2014, Aelbrecht 2016) that ellipsis is licensed by phasal heads. Second, she contends that, while there are two phasal domains -thematic and propositional, which roughly correspond to vP/VP and CP respectively -phases should be defined contextually or configurationally. In particular, she defends that phases are the highest head in a phasal domain. Third, she assumes that remnants move to a functional projection (FP) above TP prior to clausal ellipsis, along the lines of (4). Finally, and crucially, she follows Bošković (1997) in claiming that that-less embedded clauses are TPs. After having established the main features of Wurmbrand's analysis, let us see how she derives the facts. Take the example in (10): the verb think selects for a TP (following her last premise), as illustrated in (12a). In order for clausal ellipsis to apply, remnants move to a FP above the TP to escape the domain of ellipsis. Ellipsis is then licensed by the highest head in the embedded, propositional phase, which in this case is the head of FP, which triggers ellipsis of its complement, i.e. the TP: are bona fide cases of embedding, as evidenced by the overt complementizer, so one would expect embedding to be possible in English as well. Second, regular fragment answers, which display a very similar syntax to gapping (Reich 2006), can be truly embedded (see Weir 2014), see section 2.
If the complementizer is present, as in (7a), then the verb think selects for a CP complement. In this scenario, it is C and not F that is the highest phase in the propositional domain. Consequently, C ought to trigger ellipsis of its complement, which encompasses FP. Under this configuration, remnants would stay trapped within the ellipsis spell-out domain.
Although it is an interesting proposal, Wurmbrand's analysis falls short of empirical coverage as it cannot explain why in languages like Spanish or Polish, the complementizer must be present; compare (13) to (8) and (9) In what follows I claim that NCG1 should be viewed as cases of (embedded) fragment answers, in the sense of Merchant (2004).

Embedded fragments
A question like (14) can be answered, at least, in two ways: one involves repetition of the presupposed content (14a), and the other one involves pronouncing only the focus of the sentence (14b). The latter is what is commonly referred to as a fragment answer: (14) Who did you see yesterday? a. Yesterday I saw Mary. b. Mary.
We follow Merchant (2004)'s standard analysis that (14b) is derived from (14a) via clausal ellipsis. 7 In particular, this au-thor claims that the fragment undergoes movement to a functional projection above the TP prior to ellipsis: Importantly for the purposes of this paper, fragments can be embedded, as in (16), from Weir (2014: 221); see fn.6: (16) A: Who is responsible for the 9/11 attacks? B: Well, Michael Moore believes Bush.
What I defend here is that NCG1 can be derived by means of the same mechanism that derives (embedded) fragment answers (16). The difference would be that in NCG1 two remnants undergo movement to FP. This analysis is defended on the basis of two parallelisms between embedded fragments and NCG1: (i) the types of predicates under which the remnants can be embedded, section 2.2.1; and (ii) the presence/absence of the complementizer in various languages, section 2.2.2.

Embedding predicates
While fragment answers can be embedded, it has noted that not all predicates can embed them (de Cuba andMacDonald 2013, Weir 2014). This is illustrated in these minimal pairs: Let us assume that the key component here is factivity: 8 factive predicates disallow embedded fragments. One possible follows naturally from the fact that (14b) contains an elided verb that assigns accusative to the object. 8 De Cuba and MacDonald (2013) actually claim that it is not factivity that is at stake, but rather the related -yet independently motivated -notion explanation is that this is due to the fact that these predicates select for a truncated clausal structure (Vikner 1995, Haegeman 2006 which crucially lacks structural space for remnants to move to prior to ellipsis at PF. The explanation is indeed reminiscent, and correlates nicely, with the classic findings in Hooper and Thompson (1973), who noted that certain syntactic operations like topicalization cannot target the left periphery of clausal complements to factive predicates: (19) a. The inspector explained that each part he had examined carefully. (Hooper and Thompson 1973: 474, their (50)) b. * I resent the fact that each part he had to examine carefully. (ibid.: 479, their (109)).
If NCG1 involves the same structure as embedded fragment answers, we should expect the same restrictions observed in (17) and (18) The distribution of the complementizer in embedded fragments corresponds crosslinguistically with the distribution of the finite complementizer in NCG1 which, taken along with the facts about embedding, strongly suggest that we are indeed dealing with the same phenomenon.

Non-Canonical Gapping 2
The second embedded gapping string I would like to examine involves cases where both the antecedent and clause containing the gap are coordinated at the same level, and coordination appears embedded under a matrix verb. One example is provided in (25) Contrary to what happens in NCG1, where only a high coordination account is able to explain the data, NCG2 is in principle compatible with both a high and a low coordination analysis (just like any other case of canonical gapping). However, closer scrutiny reveals that a low coordination account fares better with the data.

Embedding predicates
Suppose that the predicate under which coordination is embedded is a factive one. If NCG2 involved clausal ellipsis like NCG1, then we would expect gapping to be unavailable, given that the coordinated clausal complement would lack the relevant projections for remnants to move to. However, gapping in such cases is possible even with factive predicates, as shown in (26) Hartmann (2001:157) pointed out that in sequences like the one we are dealing with, i.e. NCG2, that must be absent in English, an observation she attributes to Fiengo (1974):

Absence/presence of the complementizer
(29) Jim said that Alan went to the ballgame and (*that) Betsy went to the movies.
In NCG1, the lack of an overt complementizer in English was associated with whatever mechanism disallowed complementizers in embedded fragment answers. The lack of the complementizer in sequences like (29), however, cannot be attributed to that same mechanism, for the simple reason that if a unified account was to be pursued, we would expect the complementizer in Spanish and Polish to be mandatorily overt. This prediction, however, is not borne out: NCG2 must involve a null complementizer in these languages as well: The fact that in NCG2 is incompatible with the complementizer appears to hold for many languages. Hartmann (2001: 158) observes that the same is true in German: Taken together, the facts presented in sections 3.1 and 3.2 naturally follow if we assume a low coordination to gapping: take (31a) as an example. According to my proposal, it would involve a structure along the following lines (I only represent the embedded sentence for the sake of simplicity): In essence, the lack of a complementizer follows obviously from the fact that the second conjunct is not clausal, but rather a vP (but see below). The insensitivity to the factivity of the embedding predicate is expected: under a low coordination account, it is irrelevant whether the left periphery of the embedded predicate is truncated or not. This is so because, again, coordination takes place at a lower level, so no C-domain is involved.
As we saw before, low coordination accounts of gapping assume that coordination holds at the level of the vP. However, as correctly pointed out by a reviewer, the facts presented in this section could still follow from IP-coordination, a solution indeed entertained, but ultimately rejected, by Hartmann (2001) for German. Determining the actual syntactic node at which coordination takes place in NCG2 deserves a more careful examination of the data, a task I leave for further research.
The question that remains is, of course, what is it that bans coordination of two CPs in NCG2. The same reviewer argues that coordination of CPs must be allowed in NCG2 in languages like Spanish at least, because these strings are compatible with gapping involving left dislocated remnants (underlined for expository purposes): (34) Juan aseguró que el dinero lo había guardado en el Juan claimed that the money it had saved in the banco y las joyas en la caja fuerte. bank and the jewels in the strongbox 'Juan claimed that the money, he had saved it in the bank, and the jewels in the strongbox.' Note that under the assumption that left dislocated phrases are in the left periphery of the clause, the DP las joyas ('the jewels') must be in a CP-position. Data like (34), however, should be handled with care. To start with, note that the second conjunct is not -and in fact it cannot -be headed by a complementizer, contrary to what would happen if ellipsis had not applied: (35) a. Juan aseguró que el dinero lo había guardado en el banco y (*que) las joyas en la caja fuerte. b. Juan aseguró que el dinero lo había guardado en el banco y *(que) las joyas las había guardado en la caja fuerte.
Testing structure with clitic left dislocation is complicated by the fact that, as shown in Fernández-Sánchez (2017), clitic left dislocated phrases often appear in syntactic contexts where it can be shown independently that there is no structural space, a fact that some authors have taken to mean that dislocated phrases should be viewed as parenthetical elements (Fernández-Sánchez 2017, 2020, Fernández-Sánchez and Ott 2020). 10 But leaving these issues aside, note that there is an important asymmetry between NCG1 and NCG2, in that while it is true that the latter may in theory be compatible with two different structures (whatever they are exactly), the former is not: such cases must necessarily involve a clausal coordination. Given this, we could hypothesize that in cases where two potential derivations would yield the same output, the simplest/most economic one is preferred. Such an economy constraint would be similar to Bošković (1997) that has fewer projections is to be chosen as the syntactic representation serving that function.
Similar claims have been made in the generative literature (see Collins (2001) and Dalrymple et al. (2015) for discussion. Unfortunately, exploring this falls outside the scope and goals of this paper, so I leave this issue for further research.

Canonical gapping + NCG1
Before concluding the paper, I would like to bring to the fore a construction which, to my knowledge, was firstly noted in Brucart (1987)'s seminal work on ellipsis in Spanish and which involves two verbal gaps being separated by the finite complementizer que ('that'): (37) Pedro aseguró que nevaría en los Alpes, y Juan Pedro claimed that would snow in the Alps and Juan ___ que ___ en los Pirineos. that in the Pyrenees 'Pedro claimed that it would snow in the Alps, and Juan claimed that it would snow in the Pyrenees.' (38) Juan confirmó que Susana llegará en avión y Pedro Juan confirmed that Susana will arrive in plane and Pedro ___ que ___ en coche. that in car 'Juan confirmed that Susana will arrive by plane and Pedro confirmed that she would arrive by car.' Polish allows this construction as well, but English does not: (39) * John claimed that Susan would travel by plane and Peter ___ that ___ by car. (40) Janek powiedział, że Andrzej studiował matematykę a Janek said that Andrzej studied maths and Wojciech ___ że ___ inżynierię. Wojciech that engineering 'Janek said that Andrzej studied maths, and Wojciech said that he studied engineering.' Brucart (1987) contends that the two gaps are the result of the same operation, i.e. gapping. He attributes the unavailability of this construction in English to the fact that the rightmost gap is actually a complex object formed by the unpronounced verb preceded by a null pro. Given that English lacks pro, the ungrammaticality of (39) follows. Brucart's explanation would also account for the grammaticality of (40), given that pro is available in the grammar of Polish.
The reason to postulate the existence of pro comes from Jackendoff (1971)'s suggestion that gaps must contain remnant material at their left and right edges. However, it is well known that remnants of gapping must be focused constituents (Kuno 1976, i.a.) and it is unclear how pro can be a focused element. Further, note that under the assumption that the structure under scrutiny is unavailable in English because of the lack of pro in this language, we expect this construction to be possible if an overt subject is placed. The prediction, however, is not borne out: (41) * John claimed that Susan would arrive by plane and Pedro __ that Laura __ by car.
I would like to suggest an alternative account of these facts. Descriptively, these examples featuring a double gap can be explained in the following way: the leftmost gap is an instance of canonical gapping -the matrix verb is deleted under identity with the matrix verb in the antecedent clause. The rightmost gap is embedded under the gapped main verb so, in other words, the rightmost gap is an instance of NCG1. There are reasons to believe this. For example, if we try to use a factive verb as an embedding predicate, the sentence becomes ungrammatical: (42) *Juan lamenta que el gobierno haya subido el IVA Juan regrets that the government has raised the VAT y Pedro lamenta que el gobierno haya subido el and Pedro that the impuesto de sucesiones. tax of succession 'Juan regrets that the government has raised VAT and Pedro (regrets) that (the government has raised) the estate tax.' The ungrammaticality of (42) must be attributed to the rightmost gap. We can see this because the two gaps are independent of each other. (43b) is thus ungrammatical for the same reason that (20b) is: (43) a. Juan lamenta que el gobierno haya subido el IVA y Pedro lamenta que el gobierno haya subido el impuesto de sucesiones. b.* Juan lamenta que el gobierno haya subido el IVA y Pedro lamenta que el gobierno haya subido el impuesto de sucesiones.
The question that remains to be addressed is how is it that English disallows this double gap construction. I discuss this in the next section, where I argue that it is the lack of an overt complementizer heading NCG1 in this language that explains the unavailability of double gaps.

The clause-mate condition on gapping
To fully understand why English does not allow this construction, it is important that we introduce one locality condition to which gapping is subject: the clause-mate condition on remnants. Empirically, this condition captures the fact that the gap in (44) The explanation for the clause-mate condition cannot simply be to assume that the gap is restricted to only one instance of lexical verb. Ross (1970) already noted that the gap can contain more than one verb (46). In light of data like this one, the relevant generalization is that the gap cannot contain a finite clause boundary: (46) a. I want to try to begin to write a novel, and you a play. b. …and you want to try to begin to write a play.
The clause-mate condition appears to hold crosslinguistically.
(47) shows that gapping in Spanish cannot contain a finite clause boundary, whereas (48) illustrates that it may contain a non-finite clausal node. Examples (49) and (50) 12 The labels matrix and embedded reading capture the height at which coordination must take place in order to derive the corresponding meanings. Therefore, the embedded reading is obtained by coordination at the level of the embedded clause, and the matrix reading by coordination at the root. The clause-mate condition poses a challenging theoretical question, given that aside from gapping, it has been argued to hold in many phenomena which involve ellipsis to the exception of more than one remnant like pseudogapping (Jayaseelan 1990), multiple sluicing (Lasnik 2014) or wh-stripping (Ortega-Santos, Yoshida and Nakao 2014), which strongly suggests that there must be a general, across-construction explanation.
Suppose now that we want to derive a double gap structure in English (51). The matrix verb can undergo ellipsis via canonical gapping (51a). This operation leaves the subject DP remnant and the clausal remnant. Now to derive the embedded gap (NCG1), the remnants-to-be need to undergo movement to the left edge of that embedded clause (Merchant 2014), as shown in (51b). Crucially, as we have seen before, embedded fragments are never preceded by the complementizer in English. In the absence of a complementizer (51c), the two remnants must be interpreted as clause mates, as per the clause-mate condition on remnants. In other words: the se- I would like to suggest, thus, that the availability of the double gap construction depends on whether in a particular language embedded fragments (and by extension NCG1) are preceded by an overt complementizer. If they are not, the clause-mate condition on remnants will disallow the intended meaning.

Conclusions
In this paper I have looked at two structures which involve non-canonical gapping. NCG1, once (wrongly) thought to be ungrammatical (Hankamer 1979) at least in English, must involve clausal coordination, so it is incompatible with low coordination accounts to gapping. In this configuration, the remnants must be headed by an overt complementizer in Spanish and Polish, but in English this complementizer must be empty. Focusing on the English data, Wurmbrand (2017) proposes an account based on a flexible theory of phases, but her analysis is incompatible with the Spanish and Polish data. I have argued instead that the distribution of the finite complementizer in these three languages can be explained if we posit that the mechanism deriving NCG1 is the same one that yields embedded fragment answers. This allows, in turn, to explain why NCG1 is sensitive to the type of embedding predicate.
With respect to NCG2, I have suggested that coordination must be lower than in NCG1. This conclusion is based on the fact that remnants in this configuration are never introduced by a complementizer, even in languages where the complementizer obligatorily heads remnants, as well as by the insensitivity of NCG2 to the type of embedding predicate.
Consequently, this paper shows, in line with others (Repp 2009, Centeno 2011) that gapping is not a unified phenomenon, and that this phenomenon can result from the interplay between ellipsis and coordination at different points in the structure.
Finally, I have briefly addressed the syntax of a construction which features two gaps, which are separated by the finite complementizer, in Spanish and in Polish. I have defended that while the rightmost gap is the result of canonical gapping, the embedded gap is an instance of NCG1. This construction does not exist in English for the simple reason that in this language NCG1 cannot be headed by a complementizer, and therefore the two gaps will end up creating a complex string that is not possible to interpret.