Finding repetitions

Off-topic posts of interest to the "Everything" community.
Post Reply
Debugger
Posts: 559
Joined: Thu Jan 26, 2017 11:56 am

Finding repetitions

Post by Debugger »

How do find links (url) that repeat in a text file (on one line)?
NotNull
Posts: 5142
Joined: Wed May 24, 2017 9:22 pm

Re: Finding repetitions

Post by NotNull »

(http[^\s]*)\s.*\1
(marks from first to last)

- or -

(http[^\s]*)\s(?=.*\1)
(marks first)
Debugger
Posts: 559
Joined: Thu Jan 26, 2017 11:56 am

Re: Finding repetitions

Post by Debugger »

EmEmeditor - Both regular expressions do not work.

Image
tuska
Posts: 904
Joined: Thu Jul 13, 2017 9:14 am

Re: Finding repetitions

Post by tuska »

Debugger wrote: Sun Apr 14, 2019 7:15 am EmEmeditor - Both regular expressions do not work.
That's not true!
Attachments
Finding repetitions of Weblinks in one row.png
Finding repetitions of Weblinks in one row.png (105.95 KiB) Viewed 9619 times
Debugger
Posts: 559
Joined: Thu Jan 26, 2017 11:56 am

Re: Finding repetitions

Post by Debugger »

That's not true!
You said half-truth, because it works only for your example, but not for my text. :)
In one line can contain only one URL or several URLs, and no duplicates and text.
I mean repeating links in the whole text file (MULTILINE)
So, the regular expression can not be valid.

Code: Select all

Line 1: https://www.voidtools.com/forum/viewtopic.php?f=7&t=7656&p=25849#p25849
Nie mam szczególnych ambicji literackich. Podczas mojej długiej podróży poznałem wielu niesamowitych, kreatywnych ludzi ze świata kina, teatru, literatury i innych zawodów, z których każdy pozostawił niezatarty ślad w mojej pamięci i miłych wspomnieniach, które czasami chcę dzielić. Wszystko, o czym piszę, jest odzwierciedleniem moich doświadczeń, emocji i stanu psychicznego na pewnym etapie życia. Jeśli moje historie choć trochę dotykają kogoś, to jest to dla mnie wielkie szczęście, w przeciwnym razie przepraszam, a poza tym dziękuję za czas spędzony na moich pismach!
===
Line 2: https://www.voidtools.com/forum/viewtopic.php?f=7&t=7656&p=25849#p25849
w przeciwnym razie przepraszam, a poza tym dziękuję za czas spędzony na moich pismach!
===
Line 3: Simillar
https://i.postimg.cc/2ycdjBSG/Screen-Sh ... -26-PM.jpg
tuska
Posts: 904
Joined: Thu Jul 13, 2017 9:14 am

Re: Finding repetitions

Post by tuska »

This is what I get in EmEditor:
Finding repetitions of Weblinks in several lines.png
Finding repetitions of Weblinks in several lines.png (111.01 KiB) Viewed 9609 times
Unfortunately I can't help you with this topic anyway due to the lack of RegEx knowledge…
Ahh, just seeing that your requirements have changed.
Debugger
Posts: 559
Joined: Thu Jan 26, 2017 11:56 am

Re: Finding repetitions

Post by Debugger »

tuska wrote: Sun Apr 14, 2019 10:39 am Ahh, just seeing that your requirements have changed.[/b]
No. Only, in a different way formulated.
NotNull
Posts: 5142
Joined: Wed May 24, 2017 9:22 pm

Re: Finding repetitions

Post by NotNull »

Thanks, for testing, @tuska! (have to admit that I didn't ...)

@debugger: If your URL's don't start with http and/or end with a space (\s), you have to adapt the regular expression, of course.
I assumed you would understand that (after gazillion regex questions on this forum).
NotNull
Posts: 5142
Joined: Wed May 24, 2017 9:22 pm

Re: Finding repetitions

Post by NotNull »

Original question:
Debugger wrote: Sat Apr 13, 2019 9:54 pm How do find links (url) that repeat in a text file (on one line)?
Changed to:
Debugger wrote: Sun Apr 14, 2019 10:17 am I mean repeating links in the whole text file (MULTILINE)
How's that NOT a different requirement?
Debugger
Posts: 559
Joined: Thu Jan 26, 2017 11:56 am

Re: Finding repetitions

Post by Debugger »

Addresses do not end with any spaces, each address is on a separate line.
Need to modify the regular expression a little to be found as well:
http://www.
https://www.
www.

(http|https):\/\/[\w\-_]+(\.[\w]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?

But that does not help much, because in this way will not find the repetition of links. Regex must have the appropriate pattern for repetition.
Debugger
Posts: 559
Joined: Thu Jan 26, 2017 11:56 am

Re: Finding repetitions

Post by Debugger »

NotNull wrote: Sat Apr 13, 2019 11:14 pm
(http[^\s]*)\s.*\1
(marks from first to last)

- or -

(http[^\s]*)\s(?=.*\1)
(marks first)

:shock: The author, the developer, EmEditor told me that:
The regex does not make any senses
NotNull
Posts: 5142
Joined: Wed May 24, 2017 9:22 pm

Re: Finding repetitions

Post by NotNull »

Then I suggest you ask your emeditor questions over there from now on, instead of here.

BTW: Did you ask the exact same question? - "How do find links (url) that repeat in a text file (on one line)?".
(No need to answer that, it is a rhetorical question).
Debugger
Posts: 559
Joined: Thu Jan 26, 2017 11:56 am

Re: Finding repetitions

Post by Debugger »

NotNull - I can ask the author, but he will not answer in a very long time, and there is nothing to expect help. Also, I can ask the general users from forum who is in contact with this or that who is familiar with it. Each tip is valuable.


The only thing I could find on GOOGLE search: "find duplicate url"
Only this applies to websites, it has nothing to do with text files, which makes the task difficult.

I have to accept the fact that some duplicates can not be found unless you are a wizard who can do everything. And so most of my issues are not simple. Although in the first place I am looking for similar queries in google search.
Post Reply