Added support for case-sensitivity flags in attribute
selectors: [attr="value" i] matches the attribute value
ASCII case-insensitively, while [attr="value" s]
explicitly requests the default case-sensitive matching.
The :required and :optional pseudo-classes
are now supported. With the html or xhtml
translator they match form elements (input other than
type="hidden", select, textarea) by the
presence or absence of the required attribute; with the
generic translator they translate to a never-matching
expression, as :checked does.
The :focus-within and :focus-visible
pseudo-classes are now accepted and, like the rest of the
user-action family (:focus, :hover, :active,
...), translate to a never-matching expression, since a static
document has no such dynamic state. They previously raised "The
pseudo-class ... is unknown".
Added support for a leading :scope pseudo-class,
anchoring the selector at the queried node:
querySelectorAll(node, ":scope > a") returns only the
a children of node. css_to_xpath()
translates such selectors with the XPath self axis in
place of the prefix argument (":scope > a" becomes
"self::*/a"). A :scope anywhere else in a selector
cannot be expressed in XPath 1.0 and is rejected with a clear
error.
:not() may now appear inside the arguments of
functional pseudo-classes (e.g. :not(:not(a)),
:is(:not(.x)), :nth-child(2 of :not(.foo))).
:has() now supports leading combinators in its
arguments (e.g. e:has(> img), e:has(~ p),
e:has(+ p)).
Functional pseudo-class arguments may now be complex
selectors (contain combinators): :is(a b),
:not(a > b), e:has(> a b), and
li:nth-child(2 of ol li) previously produced an
"Expected an argument" error and are now translated.
The performance of selectr has been improved roughly 2x. There are several smaller changes that have contributed to this.
The unsupported Selectors 4 column combinator now raises
"The column combinator '||' is not supported" instead of a raw
tokenizer error, and an unknown functional pseudo-class is
reported with the user's hyphenated spelling
(:nth-col(), previously :nth_col()).
Generated XPath expressions now include parentheses only
when precedence requires them (an or-expression joined with
another condition): e[id][title] translates to
e[@id and @title] rather than
e[(@id) and (@title)]. The expressions are semantically
unchanged, but code comparing css_to_xpath() output
against stored strings will need updating.
Attribute blocks, functional pseudo-classes, and strings
left unclosed at the end of a selector are now auto-closed, as
css-syntax requires: [rel=stylesheet and :lang(fr
parse and translate exactly as their closed forms, as does a
string missing its closing quote.
The stringr and methods dependencies have been dropped. R6 is now the only package selectr imports, shrinking the install footprint.
css_to_xpath() now translates each distinct
combination of selector, prefix, and translator only once per
call, so duplicates in a vectorized call are not re-translated.
The adjacent sibling combinator no longer emits a
tautological [self::*] predicate when the right-hand side
does not name an element: h1 + *[rel=up] now translates to
h1/following-sibling::*[1][@rel = 'up'].
The minimum required version of R has been increased from
3.0 to 3.3, matching what the code (which uses
startsWith()) has in fact required for some time.
The non-standard extensions inherited from the Python
'cssselect' package — the :contains("text") pseudo-class
and the [attr!=value] attribute operator — have been
removed and now produce an error. Standard alternatives:
:not([attr=value]) for the former operator, and an XPath
contains(., 'text') predicate applied outside of
selectr for :contains().
With the html or xhtml translator,
:any-link now matches the same elements as :link.
:any-link means ":link or :visited", a superset of
:link, but it previously translated to a never-matching
expression while :link matched - the subset relation
inverted. (In a static document no link is visited, so the two
pseudo-classes coincide.)
Escape sequences in identifiers, hashes, and strings are
now decoded in a single left-to-right pass, as css-syntax
requires, so an escaped backslash followed by hex digits is no
longer decoded twice: e[foo="x\\79 z"] now matches
x\79 z rather than xyz.
A prefixed wildcard inside a pseudo-class argument
(e.g. :is(svg|*)) now translates to the node test
self::svg:* rather than the never-matching comparison
name() = 'svg:*'.
When :lang() is given an invalid argument after
valid ones (e.g. :lang(en, 5)), the error now reports the
offending argument rather than the first one.
Strings containing a raw newline are now rejected with "Unclosed string", as the CSS grammar requires.
The namespace argument of the querySelector() family
of functions now signals an error when given a list containing an
element that is not a single string; previously the prefix-to-URI
pairing was silently corrupted.
:only-child and :only-of-type now match the
root element, consistent with :first-child:last-child,
which Selectors defines :only-child to be equivalent to.
The HTML translator's :enabled and :disabled
pseudo-classes now match input elements that have no
type attribute (which default to type=text).
:dir() now enforces its CSS Selectors Level 4
argument grammar of exactly one identifier, rejecting the
strings, wildcards, and lists (e.g. :dir(ltr, rtl)) it
previously accepted via the shared :lang() grammar.
Prefixed element names inside pseudo-class arguments
(e.g. :is(svg|g), :nth-child(2 of svg|g)) are now
matched with an XPath name test (self::svg:g, or the path
step .//svg:g for :has()) instead of a
name() string comparison, so the prefix resolves through
the namespace map supplied at evaluation time (URI-based), just
as at the top level of a selector.
The no-namespace form |e now retains its namespace
constraint when the element name cannot be written as an XPath
name test (e.g. a Unicode name such as |é); previously
such names also matched in a default namespace.
The of-type pseudo-classes now work on element names that
cannot be written as an XPath name test
(e.g. é:only-of-type, *|e:first-of-type);
previously these failed with the misleading error
"*:only-of-type is not implemented".
The querySelector() and querySelectorAll()
methods for xml2 documents now accept a named list as the
ns argument, consistent with the methods for XML
documents.
The querySelector() family of functions now signals
an error when the selector argument is not a single
character string; previously all but the first selector were
silently ignored when querying the document.
css_to_xpath() now signals an error when any of its
arguments contain NA values; previously NAs were
removed before recycling, silently shifting the pairing of the
remaining values.
An+B expressions are now matched ASCII
case-insensitively, as required by CSS Syntax, so
:nth-child(2N), :nth-child(ODD), and
:nth-child(EVEN) are no longer rejected.
An+B expressions now only permit whitespace around
the sign that separates the B value, so an invalid
selector such as :nth-child(3 7) is rejected rather than
silently treated as :nth-child(37).
An+B expressions now reject non-integer values
(e.g. :nth-child(1.9), :nth-child(2e1)) rather
than silently truncating them.
Computing the specificity of a :has() selector with
a single argument (e.g. e:has(img)) no longer fails with
"incorrect number of dimensions".
Computing the specificity of a single-argument
:is() or :matches() selector no longer fails, and
the specificity of the compound the pseudo-class is attached to
is no longer dropped: div:is(.foo) now reports (0, 1, 1)
rather than (0, 1, 0).
Nesting :has() inside :has()
(e.g. section:has(article:has(div))) is now rejected, as
required by CSS Selectors Level 4 and matching browsers; sibling
uses such as e:has(a):has(b) remain valid.
:lang() and :dir() no longer accept a lone
-, which is not a valid CSS identifier.
Attribute selectors with an empty value
(e.g. [attr=""]) no longer throw an error for the
= and |= operators.
The HTML translator no longer lowercases attribute
values, only attribute names, so
[data-state="Active"] no longer silently misses matches.
The any namespace selector *|e and the
no namespace selector |e no longer both collapse
to the bare name e: *|e now translates to
*[local-name() = 'e'] so it matches e in any
namespace, and similarly for [*|attr].
Unicode escapes are now supported in identifiers and ID
selectors, not just strings, and are decoded to the characters
they represent: #\31 23 (an ID starting with a digit) no
longer fails to tokenize, and "\E9" matches é
rather than the literal value E9.
The alternatives of :is(), :matches(), and
:where() are now grouped as a single condition:
div.foo:is(.a, .b) translates to foo and (a or b)
rather than foo or a or b, and stacked uses such as
e:is(.a):is(.b) now require both conditions rather than
either.
The universal selector * is no longer silently
dropped when it appears alongside other arguments in a selector
list: :is(div, *) now matches every element,
:not(div, *) matches nothing, and
:nth-child(2 of div, *) counts all siblings.
Wildcard language ranges such as :lang(en-*) now
match under the generic (XML) translator; previously they
translated to lang('en-'), which can never match.
Fixed handling of CSS unicode escapes in attribute values.
This would be observed when the attribute value contained hexadecimal
sequences like (abcdef) where the characters inside the parentheses
were not properly escaped. This fix ensures that such sequences are correctly
translated to their XPath equivalents. Thanks to André Veríssimo for
reporting the issue.
Added support for CSS Selectors Level 4 pseudo-classes
:is(), :where(), and :has(). The :is()
pseudo-class matches elements against a list of selectors, taking the
maximum specificity from its arguments. The :where() pseudo-class
works similarly to :is() but always has zero specificity. The
:has() pseudo-class represents an element if any of the relative
selectors match when anchored against that element, with specificity
calculated from the maximum of its arguments.
Added support for complex selectors in :nth-child() and
:nth-last-child() using the of S syntax
(e.g., :nth-child(2 of .foo)). This allows matching the nth
child that matches a specific selector or selector list.
Extended :not() to accept multiple selectors separated by
commas (e.g., :not(.foo, #bar)), following CSS Selectors Level 4.
Specificity is now calculated as the maximum specificity among all
arguments, rather than the sum.
Added support for additional CSS Selectors Level 4 pseudo-classes:
:any-link, :target-within, and :local-link. These
pseudo-classes do not match any elements in static XML/HTML documents
and translate to XPath expressions that always evaluate to false.
For now, most of the new Level 4 pseudo-classes that depend on dynamic
document state e.g. :user-valid and :placeholder-shown are
not implemented, but may be at a future date to be non-matching selectors.
The :lang() and :dir() pseudo-classes now support
multiple comma-separated arguments (e.g., :lang(en, fr, de)).
Added :matches() as a backwards-compatible alias for
:is().
Improved sibling selector translation to use a more compact form.
For the adjacent sibling combinator a + b, the generated XPath now
uses a/following-sibling::*[1][self::b] instead of
a/following-sibling::*[(name() = 'b') and (position() = 1)].
The descendant combinator a b now uses
a//b instead of a/descendant::b for a more concise XPath.
Unfortunately a similar optimisation cannot be applied to in general
when attempting to replace descendant-or-self:: with .//a
as it would prevent root nodes being matched correctly.
Improved validation of CSS selector arguments. Better error
messages are now provided when pseudo-elements appear inside functional
pseudo-classes where they are not permitted (e.g., inside :is(),
:matches(), :where(), or :has()).
Enhanced input validation for :lang() and :dir()
pseudo-classes to ensure proper argument formatting and to reject
invalid or empty language tags.
Improved handling of edge cases in selector parsing, including better validation of class selector syntax and more robust handling of null or missing element components.
Simplified method registration for XML and xml2 objects. No longer necessary to hook into package load/unload events.
Improve handling of vectors of length > 1 in logical comparison. Contributed by Garrick Aden-Buie.
Minor improvements to error message construction. Contributed by Michael Chirico.
When the R.oo package is attached, the use of class
selectors no longer worked. This is due to the use of the Class
name for R.oo's base class object, where selectr was also
using it (but not exporting) the same name of Class for
representing a class selector. Consequently, selectr's code was
changed to rename the class to avoid any clashing. Because it was not
exported, this is purely an internal change. Thanks to
Francois Lemaire-Sicre for reporting the issue.
Large rewrite of internals to use the R6 OO system instead of Reference Classes. This does not affect any external facing code as the results should be identical to the previous implementation, which is why this change is marked as minor. Initial and crude performance testing (by running the test suite) indicates that the R6 implementation is approximately twice as fast at generating XPath as the Reference Classes implementation.
The minimum required version of R for selectr has been
increased from 2.15.2 to 3.0 as that is the minimum
required version of R6.
Minor performance enhancements have been made. Not only is
R6 faster than Reference Classes, the use of string formatting
has been replaced with string concatenation. Additionally dynamic
calling of methods via do.call() has been replaced with direct
method calls.
The issues in previous releases where methods can sometimes be missing should now be resolved. The bug appeared to lie in core Reference Classes code. By switching to R6, this type of issue should no longer be possible.
Improved method registration for XML and xml2 objects. Avoids checks on each use and is only performed once per dependent package load/unload.
In some environments, reference class methods were missing at
runtime. This appears to be due to some internal behaviour in them
methods package where methods are registered on an objects when
the $ operator is used for a field or method. Instead, when
a method is missing, they are manually bound to the object.
Enabled partial matching on the translator argument to
css_to_xpath(). Instead of defaulting to a generic translator,
a non-matching argument will be returned with an error.
Introduced many more unit tests via the covr package.
This enabled dead code to be trimmed and also identified areas of code
which needed improvement. Minor enhancements include: tolerate
whitespace within a :not(), more consistent results returned
from parser methods, improvements to argument parsing.
The |= attribute matching operator was not being parsed
correctly for the generic translator.
Handle scenario where a CSS comment is unclosed. Results in everything after the comment start to be removed (which may or may not result in a valid selector).
Added support for documents from the xml2 package.
selectr now also does not strictly depend on the XML
package. If either the XML or xml2 packages are
present (which are required for the querySelector methods
to work) then querySelector will begin to work for them.
This also enables selectr to be used for translation-only.
Improve support for nth-*(an+b) selectors. Ported from cssselect.
Code cleanup contributed by Kun Ren (#1).
Updated DESCRIPTION to include URL and BugReports fields. Also update email address.
Fix behaviour for nth-*(an+b) pseudo-class selectors for negative a's. Contributed to cssselect by Paul Tremberth, ported to R.
Escape delimiting characters to support new version of the stringr package. Probably should have been done in the first place. Reported by Hadley Wickham (#5).
Corrected licence to BSD 3 clause. This was the licence in use previously, but has now been made more explicit.
Removed 'Enhances' field because we import functions from XML. This choice is made because XML is a required package, rather than an optional package that can be worked with. This and the previous change have been made to keep up with recent changes in R-devel.
Added a 'CITATION' file which cites a technical report on the package.
show() methods are now available on internal objects,
making interactive extensibility and bug-fixing easier. This is
simply wrapping the repr() methods (mirroring the Python
source) that the same objects have.
Use the session character encoding to determine whether to run unicode tests. Tests break in non-unicode sessions otherwise.
Introduced new functions querySelectorNS() and
querySelectorAllNS() to ease the use of namespaces within a
document. Previously this would have required knowledge of XPath.
Fix meaning of :empty, whitespace is not empty.
Use lang() for XML documents with the :lang()
CSS selector.
|ident no longer produces a parsing error, but is now
equivalent to just 'ident'.
Now testing unicode only in non-Windows platforms on package check. Output should still be consistent, just depends on the current charset being unicode.
Initial port of the Python 'cssselect' package. Code is very literally ported, including the test suite.
Wrapped translation functionality into a single function,
css_to_xpath().
Created two convenience functions, querySelector() and
querySelectorAll(). These mirror the behaviour of the same
functions present in a web browser. querySelector() returns a
node, while querySelectorAll() returns a list of nodes.