TFW2005
Hisstank
Thundercats
TokuNation
Toyark
Home
News
G.I. Joe Movie
G.I. Joe Classified
G.I. Joe Vintage
Compatible Military Toys
G.I. Joe Comics
G.I. Joe Conventions
All News Categories
Photo Shoots
Database
G.I. Joe Database Index
G.I. Joe Classified
G.I. Joe 25th Anniversary
G.I. Joe Resolute
G.I. Joe Movie
G.I. Joe Pursuit of Cobra
G.I. Joe Renegades
G.I. Joe 30th Anniversary
G.I. Joe Retaliation
G.I. Joe Basic
G.I. Joe 50th Anniversary
G.I. Joe Con & Club
G.I. Joe FSS
Forum
Recent Posts Page
Characters
Snake Eyes
Cobra Commander
Destro
Baroness
Storm Shadow
Roadblock
Companies
Fun Pub
IDW
DeNA
Sideshow
Gentle Giant
Boss Fight
Ori Toy
Marauder
HissTank.com
>
HissTank.com - G.I. Joe
Integration
User Name
Remember Me?
Password
Rules
Register
Community
Today's Posts
Search
Community Links
Social Groups
Pictures & Albums
Members List
Search Forums
Show Threads
Show Posts
Advanced Search
Go to Page...
Thread
:
YoJoe.com
View Single Post
12-02-2022, 04:07 PM
sisco
Cobra Elite Trooper
Join Date: Sep 2008
Location: Blackwater Prison
Posts: 1,359
Size depends on how much you archive. The archiving software has a lot f parameters you can set and how deep it goes and how many links it follows. The software is similar to the Internet Archives so theoretically you could end up downloading every site Yo Joe has a link to and then all the Sites those sites link to and enventually fill up a drive.
If you set the parameters right 5-20 GBs in size depending on which part of the site you want. The archiver grabs every thing on the html and makes copies of a lot of redundant things so it adds up.
I suggest finding a small website with not much on it and doing a practice archive to get the archiver settings right for what you want and to mess around with the program before attempting to archive a big site like Yo Joe.
For example I told the archiver to ignore the Yo Joe forum urls, the marketplace, ebay, facebook, google, and to not leave the Yo Joe domain, added a lot of Yo Joe's advertisers urls to the exclusion list as well. I entered the Action Figure section, Vehicle section and comics sections urls as the Urls to archive instead of Yo Joe.com. This kept the archive limited to the toy/comics instead of Yo Joe front page news.
The archival process is also slow, do it on a pc you can let run for a full 24 hours at least. It automatically throttles its speed so it doesn't overwhelm a sites bandwidth, it usually runs downloading at under 100kbs on most sites even if you tell the program not to use a speed limit or it will follow a sites Robots.txt instructions for archiving that sites like the Internet Archive and google cache use. The program runs in the background, takes up very little pc resources, and very little of your networks bandwidth so you can leave it running while gameing, sufing the net, watching videos, etc... Its a set it up and check on it every few hours type of thing.
You could probably get everything you want in less space by manually saving each page from your browser, which takes a lot of time and when you want to view it you will have to open each page individually from its folder and Hyperlinks most likely will not work.
The benefit of an archiving program is that its done automatically and makes a full navigable website on your hardrive that works just as the original website, the hyperlinks will take you to the webpage or imagefile on your hardrive instead of to the weblink. And you can copy and paste the archives master folder and move it to other pcs or hardrives and it will still work the same.
Final note the archive should stop automatically but if it runs over 36 hours or the urls it starts showing as downloading are way off the websites domain, I'd hit cancel on it. When you hit cancel it will quickly make a note of all the domains it had left to scan without downloading them and make a working archive of the site you can then browse on your drive. Past 36 hours it most likely got everything you wanted of the site on your hardrive and was just heading down the internet rabbit hole of hyperlinks. When you open the My Websites folder on the hardrive it will have labled folders of each domain it archives a lot of these folder will be ads or partner sites that the bot followed or got sidetracked by, you can safely delete any folders that are not the Website names folder or an image hosting service if the site uses one for its images.
Last edited by sisco; 12-02-2022 at
04:12 PM
..
sisco
View Public Profile
Send a private message to sisco
Find More Posts by sisco
Sponsors
Recent Threads
G.I. Joe Classified Series Official Thread 2024!
Grid Iron Studios Official HissTank.com Mega Thread
O-Ring Returns to G.I. Joe
Delta-17 o-ring line
JoyToy Kitbash Thread
Dirtwing
Super7 Ultimates (Could-be the Official) Thread
Every Classified Figure Ranked / Classified Figure...
Classified Duke (Retro)
Classified Zartan v2
G.I.Joe Classified Picture thread
G.I. Joe x Transformers Movie is Official
Candy Apel Apprecation
Definitive Best 2001 Figure Poll
Cobra Commander #4 (SPOILERS)
Operation Mosterforce - New 1/12 Line of Figures from...
LADY JAYE by Lonzo Wilkerson
Wild Bill
Classified Scarlett (Retro)
Classified Recondo (Retro)
Removing Classified Retro Duke's visor?
What's your bare minimum, downsized collection?
A Real American Hero #306
Quick & easy (& lazy) ME Gung Ho.
Tanner3D VSV (Variable Support Vehicle) mkII
Recent Off Topic Threads
What song are you listening to?
Marvel Universe 3.75" figures
G.I. Joe March Madness 2024 Championship Battle Armor...
1:18 Airwolf kickstarter
4" Fortnite from Jazwares
Recent B/S/T Threads
MagicWazard's B/S/T thread
MOBAT sticker sheet
Extra Classified Storm Shadow & Cobra Officer -...
Classified B/S/T
looking for weapons and assorted pieces 25th Ann -50...
WTB or Trade for a Classified VAMP
VAMP box?
Classified Collection For Sale