28 Commits

Author SHA1 Message Date
Rich Felker
6ec82a3b58 fix fnmatch corner cases related to escaping
the FNM_PATHNAME logic for advancing by /-delimited components was
incorrect when the / character was escaped (i.e. \/), and a final \ at
the end of pattern was not handled correctly.
2013-12-01 14:36:22 -05:00
Szabolcs Nagy
da0fcdb8e9 fix the end of string matching in fnmatch with FNM_PATHNAME
a '/' in the pattern could be incorrectly matched against the
terminating null byte in the string causing arbitrarily long
sequence of out-of-bounds access in fnmatch("/","",FNM_PATHNAME)
2013-12-01 17:32:48 +00:00
Szabolcs Nagy
1e81fa4524 fix allocation sizes in regcomp
sizeof had incorrect argument in a few places, the size was always
large enough so the issue was not critical.
2013-10-07 13:25:19 +00:00
Rich Felker
ae4b0b96d6 revert regex "cleanup" that seems unjustified and may break backtracking
it's not clear to me at the moment whether the code that was removed
(and which is now being re-added) is needed, but it's far from being a
no-op, and i don't want to risk breaking regex in this release.
2013-02-01 01:10:59 -05:00
Szabolcs Nagy
f05f59b804 remove unused "params" related code from regex
some structs and functions had reference to the params
feature of tre that is not used by the code anymore
2013-01-15 01:05:29 +01:00
Szabolcs Nagy
dd95916382 regex: remove an unused local variable from regexec
pos_start local variable is not used in tre_tnfa_run_backtrack
2013-01-14 00:06:49 +01:00
Rich Felker
400c5e5c83 use restrict everywhere it's required by c99 and/or posix 2008
to deal with the fact that the public headers may be used with pre-c99
compilers, __restrict is used in place of restrict, and defined
appropriately for any supported compiler. we also avoid the form
[restrict] since older versions of gcc rejected it due to a bug in the
original c99 standard, and instead use the form *restrict.
2012-09-06 22:44:55 -04:00
Rich Felker
8b4c232efe fix regex on arm
TRE has a broken assumption that wchar_t is signed, which is a sane
expectation, but not required by the standard, and false on ARM's ABI.

i leave tre_char_t as wchar_t for now, since a pointer to it is
directly passed to functions that need pointer to wchar_t. it does not
seem to break anything. and since the maximum unicode scalar value is
0x10ffff, just use that explicitly rather than using the max value of
any particular C type.
2012-05-25 10:45:05 -04:00
Rich Felker
13b2945a3c remove some no-op end of string tests from regex parser
these are cruft from the original code which used an explicit string
length rather than null termination. i blindly converted all the
checks to null terminator checks, without noticing that in several
cases, the subsequent switch statement would automatically handle the
null byte correctly.
2012-05-13 17:20:01 -04:00
Rich Felker
e9cddc8e32 another BRE fix: in ^*, * is literal
i don't understand why this has to be conditional on being in BRE
mode, but enabling this code unconditionally breaks a huge number of
ERE test cases.
2012-05-13 17:16:10 -04:00
Rich Felker
952700e8c3 fix error checking for \ at end of regex (this was broken previously) 2012-05-07 17:55:13 -04:00
Rich Felker
1736148210 fix copy and paste error in regex code causing mishandling of \) in BRE 2012-05-07 17:50:32 -04:00
Rich Felker
a5a4778335 fix regex breakage in last commit (failure to handle empty regex, etc.) 2012-05-07 17:43:38 -04:00
Rich Felker
d7a90b35b9 fix ugly bugs in TRE regex parser
1. * in BRE is not special at the beginning of the regex or a
subexpression. this broke ncurses' build scripts.

2. \\( in BRE is a literal \ followed by a literal (, not a literal \
followed by a subexpression opener.

3. the ^ in \\(^ in BRE is a literal ^ only at the beginning of the
entire BRE. POSIX allows treating it as an anchor at the beginning of
a subexpression, but TRE's code for checking if it was at the
beginning of a subexpression was wrong, and fixing it for the sake of
supporting a non-portable usage was too much trouble when just
removing this non-portable behavior was much easier.

this patch also moved lots of the ugly logic for empty atom checking
out of the default/literal case and into new cases for the relevant
characters. this should make parsing faster and make the code smaller.
if nothing else it's a lot more readable/logical.

at some point i'd like to revisit and overhaul lots of this code...
2012-05-07 14:50:49 -04:00
Rich Felker
45b38550ee new fnmatch implementation
unlike the old one, this one's algorithm does not suffer from
potential stack overflow issues or pathologically bad performance on
certain patterns. instead of backtracking, it uses a matching
algorithm which I have not seen before (unsure whether I invented or
re-invented it) that runs in O(1) space and O(nm) time. it may be
possible to improve the time to O(n), but not without significantly
greater complexity.
2012-04-28 18:05:29 -04:00
Rich Felker
2b87a5db82 update fnmatch to POSIX 2008 semantics
an invalid bracket expression must be treated as if the opening
bracket were just a literal character. this is to fix a bug whereby
POSIX left the behavior of the "[" shell command undefined due to it
being an invalid bracket expression.
2012-04-26 12:24:44 -04:00
Rich Felker
b9dd43db04 fix signedness error handling invalid multibyte sequences in regexec
the "< 0" test was always false due to use of an unsigned type. this
resulted in infinite loops on 32-bit machines (adding -1U to a pointer
is the same as adding -1) and crashes on 64-bit machines (offsetting
the string pointer by 4gb-1b when an illegal sequence was hit).
2012-04-14 22:32:42 -04:00
Rich Felker
386b34a07b remove invalid code from TRE
TRE wants to treat + and ? after a +, ?, or * as special; ? means
ungreedy and + is reserved for future use. however, this is
non-conformant. although redundant, these redundant characters have
well-defined (no-op) meaning for POSIX ERE, and are actually _literal_
characters (which TRE is wrongly ignoring) in POSIX BRE mode.

the simplest fix is to simply remove the unneeded nonstandard
functionality. as a plus, this shaves off a small amount of bloat.
2012-04-13 19:50:58 -04:00
Rich Felker
b6dbdc69b6 fix broken regerror (typo) and missing message 2012-04-13 18:40:38 -04:00
Rich Felker
ad47d45e9d upgrade to latest upstream TRE regex code (0.8.0)
the main practical results of this change are
1. the regex code is no longer subject to LGPL; it's now 2-clause BSD
2. most (all?) popular nonstandard regex extensions are supported

I hesitate to call this a "sync" since both the old and new code are
heavily modified. in one sense, the old code was "more severely"
modified, in that it was actively hostile to non-strictly-conforming
expressions. on the other hand, the new code has eliminated the
useless translation of the entire regex string to wchar_t prior to
compiling, and now only converts multibyte character literals as
needed.

in the future i may use this modified TRE as a basis for writing the
long-planned new regex engine that will avoid multibyte-to-wide
character conversion entirely by compiling multibyte bracket
expressions specific to UTF-8.
2012-03-20 19:44:05 -04:00
Rich Felker
d0678b58ab make glob mark symlinks-to-directories with the GLOB_MARK flag
POSIX is unclear on whether it should, but all historical
implementations seem to behave this way, and it seems more useful to
applications.
2012-01-23 19:51:34 -05:00
Rich Felker
787c2648a9 support GLOB_PERIOD flag (GNU extension) to glob function
patch by sh4rm4
2012-01-22 15:49:42 -05:00
Rich Felker
32aea2087a duplicate re_nsub in LSB/glibc ABI compatible location 2011-06-16 16:53:11 -04:00
Rich Felker
da88b16a22 fix handling of d_name in struct dirent
basically there are 3 choices for how to implement this variable-size
string member:
1. C99 flexible array member: breaks using dirent.h with pre-C99 compiler.
2. old way: length-1 string: generates array bounds warnings in caller.
3. new way: length-NAME_MAX string. no problems, simplifies all code.

of course the usable part in the pointer returned by readdir might be
shorter than NAME_MAX+1 bytes, but that is allowed by the standard and
doesn't hurt anything.
2011-06-06 18:04:28 -04:00
Rich Felker
0dc99ac413 safety fix for glob's vla usage: disallow patterns longer than PATH_MAX
this actually inadvertently disallows some valid patterns with
redundant / or * characters, but it's better than allowing unbounded
vla allocation.

eventually i'll write code to move the pattern to the stack and
eliminate redundancy to ensure that it fits in PATH_MAX at the
beginning of glob. this would also allow it to be modified in place
for passing to fnmatch rather than copied at each level of recursion.
2011-06-05 19:29:52 -04:00
Rich Felker
a6c399cf62 eliminate (harmless in this case) vla usage in fnmatch.c 2011-06-05 13:30:56 -04:00
Rich Felker
74f75541ff fix bug in TRE found by clang (typo && instead of &) 2011-04-07 23:13:47 -04:00
Rich Felker
0b44a0315b initial check-in, version 0.5.0 2011-02-12 00:22:29 -05:00