fields module
Contains functions and classes related to fields.
Schema class
-
class whoosh.fields.Schema(**fields)
Represents the collection of fields in an index. Maps field names to
FieldType objects which define the behavior of each field.
Low-level parts of the index use field numbers instead of field names
for compactness. This class has several methods for converting between
the field name, field number, and field object itself.
All keyword arguments to the constructor are treated as fieldname = fieldtype
pairs. The fieldtype can be an instantiated FieldType object, or a FieldType
sub-class (in which case the Schema will instantiate it with the default
constructor before adding it).
For example:
s = Schema(content = TEXT,
title = TEXT(stored = True),
tags = KEYWORD(stored = True))
-
add(name, fieldtype)
Adds a field to this schema. This is a low-level method; use keyword
arguments to the Schema constructor to create the fields instead.
| Parameters: |
- name – The name of the field.
- fieldtype – An instantiated fields.FieldType object, or a FieldType subclass.
If you pass an instantiated object, the schema will use that as the field
configuration for this field. If you pass a FieldType subclass, the schema
will automatically instantiate it with the default constructor.
|
-
analyzer(fieldname)
- Returns the content analyzer for the given fieldname, or None if
the field has no analyzer
-
field_by_name(name)
Returns the field object associated with the given name.
| Parameter: | name – The name of the field to retrieve. |
-
field_by_number(number)
Returns the field object associated with the given number.
| Parameter: | number – The number of the field to retrieve. |
-
field_names()
- Returns a list of the names of the fields in this schema.
-
fields()
- Yields (“fieldname”, field_object) pairs for the fields
in this schema.
-
has_vectored_fields()
- Returns True if any of the fields in this schema store term vectors.
-
name_to_number(name)
- Given a field name, returns the field’s number.
-
number_to_name(number)
- Given a field number, returns the field’s name.
-
scorable_fields()
- Returns a list of field numbers corresponding to the fields that
store length information.
-
stored_field_names()
- Returns the names, in order, of fields that are stored.
-
stored_fields()
- Returns a list of field numbers corresponding to the fields that are stored.
-
to_number(id)
- Given a field name or number, returns the field’s number.
-
vectored_fields()
- Returns a list of field numbers corresponding to the fields that are
vectored.
FieldType base class
-
class whoosh.fields.FieldType(format, vector=None, scorable=False, stored=False, unique=False)
Represents a field configuration.
The FieldType object supports the following attributes:
- format (fields.Format): the storage format for the field’s contents.
- vector (fields.Format): the storage format for the field’s vectors
(forward index), or None if the field should not store vectors.
- scorable (boolean): whether searches against this field may be scored.
This controls whether the index stores per-document field lengths for
this field.
- stored (boolean): whether the content of this field is stored for each
document. For example, in addition to indexing the title of a document,
you usually want to store the title so it can be presented as part of
the search results.
- unique (boolean): whether this field’s value is unique to each document.
For example, ‘path’ or ‘ID’. IndexWriter.update_document() will use
fields marked as ‘unique’ to find the previous version of a document
being updated.
The constructor for the base field type simply lets you supply your
own configured field format, vector format, and scorable and stored
values. Subclasses may configure some or all of this for you.
-
clean()
- Clears any cached information in the field and any child objects.
-
index(value)
- Returns an iterator of (termtext, frequency, encoded_value) tuples.
Pre-made field types
-
class whoosh.fields.ID(stored=False, unique=False, field_boost=1.0)
Configured field type that indexes the entire value of the field as one
token. This is useful for data you don’t want to tokenize, such as the
path of a file.
| Parameter: | stored – Whether the value of this field is stored with the document. |
-
class whoosh.fields.IDLIST(stored=False, unique=False, expression=None, field_boost=1.0)
Configured field type for fields containing IDs separated by whitespace
and/or puntuation.
| Parameters: |
- stored – Whether the value of this field is stored with the document.
- unique – Whether the value of this field is unique per-document.
- expression – The regular expression object to use to extract tokens.
The default expression breaks tokens on CRs, LFs, tabs, spaces, commas,
and semicolons.
|
-
class whoosh.fields.STORED
- Configured field type for fields you want to store but not index.
-
class whoosh.fields.KEYWORD(stored=False, lowercase=False, commas=False, scorable=False, unique=False, field_boost=1.0)
Configured field type for fields containing space-separated or comma-separated
keyword-like data (such as tags). The default is to not store positional information
(so phrase searching is not allowed in this field) and to not make the field scorable.
| Parameters: |
- stored – Whether to store the value of the field with the document.
- comma – Whether this is a comma-separated field. If this is False
(the default), it is treated as a space-separated field.
- scorable – Whether this field is scorable.
|
-
class whoosh.fields.TEXT(analyzer=None, phrase=True, vector=None, stored=False, field_boost=1.0)
Configured field type for text fields (for example, the body text of an article). The
default is to store positional information to allow phrase searching. This field type
is always scorable.
| Parameters: |
- stored – Whether to store the value of this field with the document. Since
this field type generally contains a lot of text, you should avoid storing it
with the document unless you need to, for example to allow fast excerpts in the
search results.
- phrase – Whether the store positional information to allow phrase searching.
- analyzer – The analysis.Analyzer to use to index the field contents. See the
analysis module for more information. If you omit this argument, the field uses
analysis.StandardAnalyzer.
|
-
class whoosh.fields.NGRAM(minsize=2, maxsize=4, stored=False, field_boost=1.0)
Configured field that indexes text as N-grams. For example, with a field type
NGRAM(3,4), the value “hello” will be indexed as tokens
“hel”, “hell”, “ell”, “ello”, “llo”.
| Parameters: |
- stored – Whether to store the value of this field with the document. Since
this field type generally contains a lot of text, you should avoid storing it
with the document unless you need to, for example to allow fast excerpts in the
search results.
- minsize – The minimum length of the N-grams.
- maxsize – The maximum length of the N-grams.
|
Exceptions
-
exception whoosh.fields.FieldConfigurationError
-
exception whoosh.fields.UnknownFieldError