Bug with repetitions before regexp capture groups?

I’m trying to use regexp capture groups to slice an integer out of a string, but my capture group is giving me -1 as values for both begin and end. I couldn’t see any docs on an intended meaning for that value. I think it has to do with using a repetition (* or +) prior to the capture group. Here’s a fairly minimal way to reproduce. It shows two slightly different regexps, one which works and which doesn’t.

#require "PrettyPrinter.class.nut:1.0.1"
pp <- PrettyPrinter(null, false);
print <- pp.print.bindenv(pp);

local str = " 1";
local re1 = @"\s(\d+)";  // match a single space
local re2 = @"\s*(\d+)"; // match 0 or more spaces

server.log("SINGLE");
print(regexp(re1).capture(str));

server.log("REPEATED");
print(regexp(re2).capture(str));

I would expect identical output for each match, but it’s not. Here’s what I get (on either agent or device):

SINGLE
[
    {
        "begin": 0,
        "end": 2
    },
    {
        "begin": 1,
        "end": 2
    }
]
REPEATED
[
    {
        "begin": 0,
        "end": 2
    },
    {
        "begin": -1,
        "end": -1
    }
]

Hey Imp, can someone take a look at this? @hugo @peter :pray: My last bug report was kind of esoteric and not really Imp’s problem, but this time it’s for real! Reliable regexes are important and I have now confirmed that this bug is a deviation in behaviour compared to the official Squirrel implementation. Here’s a version of the code that can be run offline in the Squirrel 3.1 interpreter: regexp_test.nut

There are definitely issues with the regexp code, which is why on the agent-side we recommend using regexp2 instead. (The code implementing Squirrel’s regexp2 is too large to fit on the device, sadly.)

However, as you imply, if upstream Squirrel works then it must be something we’ve done and so might be fixable. We’ll take a look.

Peter

1 Like