Compatible with Python 3.4+

Purpose

This library and command line tool compresses multiple strings into one regular expression that can be used to find/match these strings later in larger piece of text.

Installation

As simple as pip install w2re

Example use

Input string are: is, in, it, if, the, than

As a library:

from w2re import iterable_to_regexp                                         
    
iterable_to_regexp(['is', 'in', 'it', 'if', 'the', 'than'])

'(?:i[fnst]|th(?:e|an))'

As command line tool:

echo -e "is\nin\nit\nif\nthe\nthan" | w2re

(?:i[fnst]|th(?:e|an))

Input text is The Zen of Python

Counting words:

from collections import Counter
from re import findall

from requests import get
from w2re import iterable_to_regexp

Counter(
    findall(
        iterable_to_regexp(['is', 'in', 'it', 'if', 'the', 'than']),
        get('https://raw.githubusercontent.com/python/peps/master/pep-0020.txt').text
    )
).most_common()

[('is', 15), ('it', 12), ('in', 11), ('than', 8), ('the', 7), ('if', 2)]

Features

Collapsing multiple strings from command line input

This is very useful if you need to search for multiple strings and are not sure how to write the correct regexp (or like me, are lazy and write libraries for it instead).

Terminate your input with EOF (Ctrl+D on empty line in Linux).

w2re
i am searching for this
and this
and this as well

(?:i\ am\ searching\ for\ this|and\ this(?:\ as\ wel{2})?)

Collapsing of repeated sequences

echo 'hahaha' | w2re

(?:ha){3}

This unfortunately does not produce a range yet. E.g. subsubsection, subsection and section will become s(?:ection|ubs(?:ection|ubsection)) rather than expected (?:sub){0,2}section.

Automatic escaping of regular expressions

echo '* test: ...' | w2re

\*\ test\:\ \.{3}

Reading words from a file on command line

w2re -i /usr/share/dict/words

Command line filter

head -n 10 /usr/share/dict/words | w2re

A(?:\'s|MD(?:\'s)?|OL(?:\'s)?|WS(?:\'s)?|achen(?:\'s)?)

Reading words from iterable

import w2re                                         
    
w2re.iterable_to_regexp(['is', 'in', 'it', 'if', 'the', 'than'])

'(?:i[fnst]|th(?:e|an))'

Reading words from stream

import w2re                 
import io                        
    
w2re.stream_to_regexp(io.StringIO('is\nin\nit\nif\nthe\nthan'))

'(?:i[fnst]|th(?:e|an))'

Multiple output formats

`w2re.PythonFormatter`

Standard Python formatted regular expression, based on the re module. This is the default formatter for command line and library.

import w2re                                         
    
w2re.iterable_to_regexp(['is', 'in', 'it', 'if', 'the', 'than'], w2re.PythonFormatter)

'(?:i[fnst]|th(?:e|an))'

`w2re.PythonWordMatchFormatter`

Standard Python formatted regular expression, based on the re module. Suitable for matching whole words, rather than strings. Unlike PythonFormatter, it won't match Python in Pythonista.

import w2re                                         
    
w2re.iterable_to_regexp(['is', 'in', 'it', 'if', 'the', 'than'], w2re.PythonWordMatchFormatter)

'(?:\\W+|\\A)((?:i[fnst]|th(?:e|an)))(?=\\W+|\\Z)'

`w2re.BaseFormatter`

Base class for implementation of custom formatters. See the w2re.formatters module.

Package Rankings

Top 24.76% on Pypi.org

Badges

Extracted from project README

Related Projects

crocs

Python to Regex. Regex to Python. The yRegex for humans.

09 Jul 2017 519

exrex

Irregular methods on regular expressions

27 May 2012 890

py_regular_expressions

Learn Python Regular Expressions step by step from beginner to advanced levels

11 Jan 2019 1,760

tiny_python_projects

Code for Tiny Python Projects (Manning, 2020, ISBN 1617297518). Learning Python through test-driv...

16 May 2019 1,486

augtxt

yet another text augmentation python package

22 Nov 2020 2

awkg

awkg is an awk-like text-processing tool powered by python language

22 Jul 2019 4

daemybenscrypt

μ wasm-lang: "däμbenscrypt". (fun dadaistic "art")

14 May 2020 4

CommonRegex

A collection of common regular expressions bundled with an easy to use interface.

17 Dec 2013 1,564

python-cheatsheet

Comprehensive Python Cheatsheet

25 Jan 2018 35,334

PySwiftyRegex

Easily deal with Regex in Swift in a Pythonic way

24 Jun 2015 230

fsed

Aho-Corasick string replacement utility

12 Dec 2015 23

chomsky

Another language grammar parser. Inspired by modgrammar and pyparsing

21 Jun 2012 9

grex

A command-line tool and Rust library with Python bindings for generating regular expressions from...

05 Oct 2019 7,207

words-to-regular-expression