Features

Feature Name

Description

Creator Features

Creator Name*

Username of article creator

Creator Days on Site

How long has article creator had a WP account?

Creator Num Edits

How many edits has the article creator made sitewide?

Creator Status

Creator’s account status (open or blocked)

Num other pages

Number of other kept pages created by article creator

Userpage

Does article creator have a userpage?

Page Features

Title*

Article Title

Createdate*

Date article was created

File Size!

Size of the entire file, including all revisions

Has Talk Page!

Does the article have an accompanying talk page

Topic Features

Num links to here

Number of other Wikipedia articles linking to the article

Num links in from Web

Number of external web pages linking to article

Pageviews**

Number of visits page has had

Num Hits

Number of search engine results for the page title

Article Features

Num categories

Number of WP categories the article belongs to

Num images

Number of images in the article

Num references

Number of references in the article

Num sections

Number of sections in the article

Num out Wikilinks

Number of links in the article to other WP pages

Infobox*

What kind of infobox does the article contain?

Total Size in Bytes

Length, in bytes, of the final version of the article

Revision Features

Num Revisions

Number of revisions to the article

Num registered edits

Number of edits made by registered users

Num anonymous edits

Number of edits made by anonymous users

Num unique Editors

Number of unique users who edited the article

Time to Delete

Number of days between article creation and proposal for deletion

More than half anon

Boolean; are more than half of the edits made by anon users?

Has main editor

Boolean; has one user created > 50% of the content?

Creator is main editor

Boolean; is the article creator the main editor?

Likelihood Autobio

String similarity between article title and creator’s username

Text*

Bag of all words in the final version of the article

Language Features

Normalized noun count

# of nouns in article, normalized by article length

Normalized verb count

# of verbs in article, normalized by article length

Normalized adjective count

# of adjectives in article, normalized by article length

Normalized adverb count

# of adverbs in article, normalized by article length

FK reading level

Flesch-Kincaid reading level of the article

SMOG reading level

SMOG reading level index

Cl level

Coleman-Liau reading level index

Level avg

Average of 5 reading level indexes

FK reading ease

Flesch-Kincaid reading ease measure of the article

* Not used for classification because these features are extremely sparse.

** Only used in New dataset because of a MediaWiki software error that miscounted pageviews in December 2011.

! Not used in the Original dataset classification because feature selection found that they reduced accuracy.