regex - Confusion about lookahead precedence - Stack Overflow

admin2025-04-15  2

I have following regex

[,{:](?!\s*"[\w\s,]+")

And following test data

{" hi ":"hallo", hi : hallo, "hi": hallo,hi: "hallo", hi: {  "hallo": wu  },hi: "Hallo Trippel, hier, hallo"}

I am confused why my negative lookahead doesnt match "Hallo Trippel, hier, hallo" at all. He, at least that is how it seems to me, matches only "Hallo Trippel. That is thrown away, and then he continues with , hieras next match. See:

I know that it is the character class at beginning of my regex, that starts matching there. But I dont understand why my negative lookahead doesnt "consume" all what I want him to consume. He stops, like he is in lazy mode.

Expectation: I expected all matches except the last two.

I have following regex

[,{:](?!\s*"[\w\s,]+")

And following test data

{" hi ":"hallo", hi : hallo, "hi": hallo,hi: "hallo", hi: {  "hallo": wu  },hi: "Hallo Trippel, hier, hallo"}

I am confused why my negative lookahead doesnt match "Hallo Trippel, hier, hallo" at all. He, at least that is how it seems to me, matches only "Hallo Trippel. That is thrown away, and then he continues with , hieras next match. See: https://regex101.com/r/w5woLb/1

I know that it is the character class at beginning of my regex, that starts matching there. But I dont understand why my negative lookahead doesnt "consume" all what I want him to consume. He stops, like he is in lazy mode.

Expectation: I expected all matches except the last two.

Share Improve this question edited Feb 6 at 21:21 Arvind Kumar Avinash 79.8k10 gold badges92 silver badges135 bronze badges asked Feb 4 at 15:06 TschenkelTschenkel 655 bronze badges 4
  • Your input looks like JavaScript. Don't use regular expressions to parse code; they are not powerful enough for this job. Use a language parser. – axiac Commented Feb 4 at 15:50
  • @anubhava I expected all matches except the last two. – Tschenkel Commented Feb 4 at 16:41
  • @axiac Thank you for this hint. So no, it is not from javascript context. I am doing python and need to handle data there, which is from the structure like my example data. But yeah its really valid javascript code. I dont know if this really helps me currently but definitely nice to know. – Tschenkel Commented Feb 4 at 16:49
  • @Tschenkel Are you trying to parse JSON? There are existing libraries for that. – Progman Commented Feb 7 at 20:45
Add a comment  | 

2 Answers 2

Reset to default 3

@anubhava I expected all matches except the last two.

Looks like you are trying to match one of the ,, {, : characters that are outside the double quotes as you don't want to match 2 commas inside the last pair of double quotes.

For this purpose you can use this regex:

[,{:](?=(?:(?:[^"]*"){2})*[^"]*$)

RegEx Demo

Details:

  • [,{:]: Match any one of the ,, {, : characters
  • (?=...): a lookahead to make sure there are even number of double quotes after above characters

But I dont understand why my negative lookahead doesnt "consume" all what I want him to consume

Lookahead and lookbehind, collectively called "lookaround", are zero-length assertions. They are called "assertions": they do not consume characters in the string, but only assert whether a match is possible or not.

More information

转载请注明原文地址:http://www.anycun.com/QandA/1744711377a86558.html