File:Heaps' Law on "War and Peace".svg
Original file (SVG file, nominally 662 × 491 pixels, file size: 156 KB)
Captions
Summary
[edit]DescriptionHeaps' Law on "War and Peace".svg |
English: Verification of Heaps' law on War and Peace.
```python import nltk import urllib.request from collections import Counter import matplotlib.pyplot as plt import numpy as np
url = "http://www.gutenberg.org/files/2600/2600-0.txt" response = urllib.request.urlopen(url) long_txt = response.read().decode('utf8') import random
tokenizer = nltk.tokenize.RegexpTokenizer('\w+') tokens = tokenizer.tokenize(long_txt.lower()) tokens = tokens[940:]
total_words = np.arange(1, len(tokens) + 1) unique_words = np.zeros(len(tokens))
word_set = set() for i, token in enumerate(tokens): word_set.add(token) unique_words[i] = len(word_set)
log_total_words = np.log(total_words) log_unique_words = np.log(unique_words) beta, logK = np.polyfit(log_total_words, log_unique_words, 1) K = np.exp(logK)
print('K:', K) print('beta:', beta)
plt.figure(figsize=(8, 6)) plt.plot(total_words, unique_words, label='Empirical Data') plt.plot(total_words, K * total_words ** beta, '--', label=f'Heaps\' Law Fit: K={K:.2f}, beta={beta:.2f}')
tokenizer = nltk.tokenize.RegexpTokenizer('\w+') tokens = tokenizer.tokenize(long_txt.lower()) tokens = tokens[940:] random.shuffle(tokens)
total_words = np.arange(1, len(tokens) + 1) unique_words = np.zeros(len(tokens))
word_set = set() for i, token in enumerate(tokens): word_set.add(token) unique_words[i] = len(word_set)
log_total_words = np.log(total_words) log_unique_words = np.log(unique_words) beta, logK = np.polyfit(log_total_words, log_unique_words, 1) K = np.exp(logK)
print('K:', K) print('beta:', beta)
plt.plot(total_words, unique_words, label='Shuffled Empirical Data') plt.plot(total_words, K * total_words ** beta, '--', label=f'Heaps\' Law Fit for shuffled data: K={K:.2f}, beta={beta:.2f}')
|
Date | |
Source | Own work |
Author | Cosmia Nebula |
Licensing
[edit]- You are free:
- to share – to copy, distribute and transmit the work
- to remix – to adapt the work
- Under the following conditions:
- attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.
File history
Click on a date/time to view the file as it appeared at that time.
Date/Time | Thumbnail | Dimensions | User | Comment | |
---|---|---|---|---|---|
current | 22:54, 18 July 2023 | 662 × 491 (156 KB) | Cosmia Nebula (talk | contribs) | Uploaded while editing "Heaps' law" on en.wikipedia.org |
You cannot overwrite this file.
File usage on Commons
There are no pages that use this file.
File usage on other wikis
The following other wikis use this file:
- Usage on en.wikipedia.org
Metadata
This file contains additional information such as Exif metadata which may have been added by the digital camera, scanner, or software program used to create or digitize it. If the file has been modified from its original state, some details such as the timestamp may not fully reflect those of the original file. The timestamp is only as accurate as the clock in the camera, and it may be completely wrong.
Width | 529.210744pt |
---|---|
Height | 392.514375pt |