Archive

Archive for September, 2009

Delete files not in use

September 29th, 2009 jud No comments

Sometimes you need to clear out a directory of files but don’t want to delete any files in use by a program or user. This little script is the answer to those problems.

There are two interesting items in this script. The first is that this directory had grown so large that the rm -rf * command was not able to process it, which is the reason I dump the listing to a temporary file. The second item is the $? within the test command. I am testing the return of the last command that was executed. By Unix convention a 0 is returned upon successful completion, any other value implies failure. That line could also have been written:

if fuser /var/spool/filter/$I >/dev/null 2>&1
then
code...
fi

Regardless, here is the script.

#!/bin/bash
#  2007-08-28  Jud Bishop
#  Released under the GPLv2.
#  This little script checks to makes sure that a file is not in
#  use before it deletes the file or directory, which is why I
#  use an -rf, to delete the directory.  The reason I have to
#  write the directory listing to a file is because the directory
#  gets too large to be handled by the rm command.

ls /var/spool/filter >/tmp/purge-filter

for I in `cat /tmp/purge-filter`
do
     fuser /var/spool/filter/$I >/dev/null 2>&1
     if [ $? -ne 0 ]
     then
          #echo "Deleting /var/spool/filter/$I"
          rm -rf /var/spool/filter/$I
     fi
done
Categories: Code, Linux Tags:

QoS Outline

September 25th, 2009 jud 1 comment

The point of this post is to outline where I am headed in my studies.  I have been reading three different books in regards to QoS and am starting to believe I have gotten off track on what I need to know for ONT.  I’m not too worried about reading too much, but I am worried about retention and memorization for the test.  So this post is an outline of what is covered in the ONT book, I will then backfill notes in different posts from all three books.  My goal is to formalize my understanding of the topics and keep them in a logical order for when I go back through and start doing exam prep questions from the different books.

Chapter 2 — IP Quality of Service

QoS is affected by:

  • Bandwidth
  • Delay
  • Jitter
  • Packet Loss

QoS Models:

  • Best-Effort
  • IntServ
  • DiffServ

QoS Implementation:

  • Command Line
  • MQC — Cisco Recommended
  • AutoQoS
  • SDM QoS Wizard

Chapter 3 — Classification, Marking and NBAR

  • Classification
    • Header fields for marking
    • Done through something like an ACL
  • Marking
    • Setting fields in header
    • IP Header Fields
      • DSCP
      • Precedence
      • Lan CoS, 802.1Q ISL
  • NBAR
  • QOS Service Class
  • Trust Boundaries

Chapter 4 — Congestion Management and Queuing

  • FIFO
  • PQ
  • RR
  • WRR
  • WFQ
  • CBWFQ
  • LLQ

Chapter 5 — Congestion Avoidance, Policing, Shaping and Link Efficiency Mechanisms

  • Congestion Avoidance
    • Tail Drop
    • RED
    • WRED
    • CBWRED
  • Traffic Shaping and Policing
    • Measuring Traffic Rates
  • Link Efficiency
    • L2 Compression
    • Header Compression
    • LFI

Chapter 6 — Implementing QoS Pre-Classify and Deploying End-to-End Qos

  • VPN
  • Pre-Classification
  • Deploying End-to-End

Chapter 7 — Implementing AutoQoS

  • Deploying AutoQoS
  • Verifying AutoQoS
  • Common AutoQoS problems
  • Interpreting and Modifying AutoQoS


Chapter 8 — Wireless QoS

Chapter 9 — Encryption and Authentication

Chapter 10 — WLAN Management

Categories: CCNP ONT Tags:

QoS Basics

September 22nd, 2009 jud No comments

QoS is managed fairness or managed unfairness. Hey, life is not fair, do you expect your network to be fair?  Think NBAR, network based application recognition, it’s whole point of existence is to recognize traffic from different applications and classify accordingly.  The antithesis of net neutrality.  We use it in our network and I would expect every other networker to do the same, unless of course you work for an ISP.

The point of QoS is to shape the traffic according to it’s needs. QoS is a trade off between what different flows need, a voice flow is affected by delay and jitter while a print job is not. Therefore, a tradeoff is made so that voice calls will be intelligible and not choppy while printing is given a lower priority but the affect should not be noticeable to the end user.

QoS is affected by:
Bandwidth
Delay
Jitter
Packet Loss

Bandwidth — Bits per second delivered across a medium.

QoS tools that affect bandwidth:

  • Compression — Some happen before while others happen after queueing.
  • Call Admission Control (CAC) — CAC descides whether a network can accept new voice and video calls and how those calls can be routed.
  • Queuing Tools — Can reserve bandwidth for different classes of traffic.  Queuing tools create queues which are emptied according to a scheduling algorithm.  If an algorithm reserves bandwidth and multiple queues have packets waiting, IOS takes packets so that each queue gets its configured bandwidth. If only one queue has packets it gets the full bandwidth until another queue is activated.

Delay –  There are multiple types of delay in a network.  Some are standard or fixed and some are variable in their affects.  Here are a number of delay types:

  • Serialization delay — The time it takes to encode the bits of a packet onto the physical interface.  The formula to calculate delay is #bits sent/link speed.  Serialization has a greater effect on slower links than faster ones, a 10G interface takes less time to encode data onto the wire than a 56K link.
  • Propagation delay — The time it takes to get a bit from one end of a link to the other, the only variable that effects propagation delay is the length of the link.
  • Queuing delay — When a packet is waiting to be sent, usually just output delays as input queuing is generally negligible.
  • Forwarding/processing delay — Encompasses the time between when a frame is received and has been placed in the output queue.  Includes the time to process the route and forward the packet, not the time in the output queue.
  • Shaping delay — Queues are served slower because of traffic shaping.  When shaping is implemented on a frame relay link it is usually to reduce the number of drops, but it must increase delay in order to accomplish that goal.

QoS tools that affect delay:

  • Queing — Queuing is the most popular QoS tool, and involves giving some queues of packets priority over others.  Similar to the first class and business class lines at the airport.  A business class flyer who gets to the airport 30 minutes after you start waiting in line can be processed in and on her way to the gate before you even see a ticket agent.
  • Link Fragmentation and Interleaving (LFI) — Allows a router to “weave” packets onto a link so that a low pritority fragmented packet does not hold up a higher priority packet.  The higher priority packet can be interwoven onto the link between fragments of the lower priority.
  • Compression — Takes a packet, or packet header, and compresses the data so that it uses fewer bits.
  • Traffic Shaping — Traffic is delayed upon a link in order to fall within CIR, with the goal of reducing drops.

Jitter (delay variation) — When consecutive packets experience different amounts of delay.  Data applications tend to be much more forgiving of jitter than voice and video.

The same tools that affect delay also affect jitter.

Packet Loss –  Most of the reasons routers lose, drop or discard packets QoS can not fix, but it can help to minimize the packets lost to full queues.

QoS tools that affect packet loss:

  • Random Early Detection (RED) – TCP uses windowing to restrict the amount of data sent without an acknowledgment.  When queues are not very full, RED does nothing to slow traffic, however, when queues are filling RED discards some packets.  RED tries to shrink the window of a sending application before a router’s queue fills by dropping packets before the queue is full, thereby shrinking the window.  RED manages the end of a queue while the queuing tool manages the front.
  • Queuing — Longer queues increase delay but avoid loss.  Put voice and video traffic in a shorter queue than printing, ftp or email traffic.

These notes are taken from the book Cisco QoS Exam Certification Guide, 2nd ed. by Odom and Cavanaugh.  I realize this book is for a different track than the ONT exam but I did not like the depth of the ONT book in regards to QoS so I am supplementing my reading.  Please note that I am not an expert and you should purchase this book for yourself rather than relying upon my garbled understanding of the topic and poorly paraphrased notes.  That being said if you notice anything incorrect please add a comment, you will help with my understanding of the topic and anyone else who happens to read this blog.

Categories: CCNP ONT, Routing Tags:

Hacking rommon

September 19th, 2009 jud 2 comments

Today I needed to hack a 3620 in order to get a certain image on it and it took awhile for me to find the information I needed. I guess my google-fu was not up to snuff and my coworker came through. Because of that, I wanted to document my adventures to make it easier for others to do the same.

What we want to do is change the serial number of the router in order for the image to load. This is how you do it.

rommon 1 > cookie
cookie:
00 01 00 30 85 d7 e0 60 0a ff 73 18 50 12 00 20
25 11 89 64 b0 ff 01 02 09 ff ff ff ff ff 00 02

Each Cisco router has it’s own rommon password that is determined by the cookie. So next we need to calculate the password, fire up your trusty scientific calculator and add the first five numbers in 16-bit hex:

00 01
+ 00 30
+ 85 d7
+ e0 60
+ 0a ff

————————-
=  17167

The password is only four characters, so remove the most significant bit and the password is 7167.

rommon 2 > priv
Password: 7167
You now have access to the full set of monitor commands.
Warning: some commands will allow you to destroy your
configuration and/or system images and could render
the machine unbootable.

So now we can do some serious damage. In this next session if there is nothing after the greater than sign, just press enter, also notice that each 8-bit hex corresponds to the answers in order:

rommon 3 > cookie
View/alter bytes of serial cookie by field --
Input hex byte(s) or: CR -> skip field; ? -> list values
interfaces: 00    (unknown)
>

vendor: 01    (cisco)
>

ethernet Hw address: 00 30 85 d7 e0 60
>

processor: 0a    (C3600)
>

unused 1: ff 73 18 50 12 00 20
>

BCD-packed 8-digit serial #: 25 11 89 64
> ff 11 44 55

unused 2: b0 ff 01 02 09 ff ff ff ff
>

capabilities (future): ff 00
>

cookie version #: 02
>
rommon 4 >

The section that is the binary coded decimal serial number is what we want to change, so we are changing the old serial number from 25-11-89-64 to ff-11-44-55.

BCD-packed 8-digit serial #: 25 11 89 64
> ff 11 44 55

Time to test whether the change was mode so we reboot the router and make sure it took:

rommon 4 > reset

And after the reboot:

rommon 1 > cookie
cookie:
00 01 00 30 85 d7 e0 60 0a ff 73 18 50 12 00 20
ff 11 44 55 b0 ff 01 02 09 ff ff ff ff ff 00 02
rommon 2 >

Notice that the serial number is now changed.

Hacking the cookie of a 2600 is not as easy, this is how it is done. I am using a Cisco 2611XM but it should be similar on any router in this class.

rommon 1 > cookie

cookie:
01 01 00 11 92 74 d2 80 43 20 00 ff 03 6b 00 20
00 00 00 00 00 00 00 00 4a 41 45 08 26 4d 53 50
51 03 01 00 00 00 00 00 00 ff ff ff 50 06 49 1d
ff 05 ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

Finding the rommon password to get into the priv command is the same as above, however, editing the cookie is much different. For this particular router the priv password is a926.

rommon 2 > priv
Password: a926
You now have access to the full set of monitor commands.
Warning: some commands will allow you to destroy your
configuration and/or system images and could render
the machine unbootable.

The next section is how to change the serial number. The serial number of this router is 4a-41-45-08-26-4d-53-50-51. In order to change it you want to change all of the bytes on line 0×18-0x1F and the first two bytes of 0×20-0×27

rommon 3 > cookie

View/alter bytes of serial cookie by field --
Input hex byte(s) or: CR -> skip field; ? -> list values
bytes 0x00-0x07: 01 01 00 11 92 74 d2 80
               >

bytes 0x08-0x0F: 43 20 00 ff 03 6b 00 20
               >

bytes 0x10-0x17: 00 00 00 00 00 00 00 00
               >

bytes 0x18-0x1F: 46 54 58 09 45 57 30 4d
               >

bytes 0x20-0x27: 59 03 01 00 00 00 00 00
               >

bytes 0x28-0x2F: 00 ff ff ff 50 06 49 1d
               >

bytes 0x30-0x37: ff 05 ff ff ff ff ff ff
               >

bytes 0x38-0x3F: ff ff ff ff ff ff ff ff
               >

bytes 0x40-0x47: ff ff ff ff ff ff ff ff
               >

bytes 0x48-0x4F: ff ff ff ff ff ff ff ff
               >

bytes 0x50-0x57: ff ff ff ff ff ff ff ff
               >

bytes 0x58-0x5F: ff ff ff ff ff ff ff ff
               >

bytes 0x60-0x67: ff ff ff ff ff ff ff ff
               >

bytes 0x68-0x6F: ff ff ff ff ff ff ff ff
               >

bytes 0x70-0x77: ff ff ff ff ff ff ff ff
               >

bytes 0x78-0x7F: ff ff ff ff ff ff ff ff
               >
rommon 4 >
Categories: Routing Tags:

Let this be a warning

September 18th, 2009 jud No comments

The other day I was reading about the decline of Yahoo, let me tell why I think Yahoo is on the skids. Some of you may know that I fight email quite a bit. It takes up a good 10% of my work week. When it’s going well, there are no problems and it’s 0%, but when it isn’t, $%#$@^@#^. And I interface with Exchange quite a bit as we use AD for authentication and all managers/directors use Exchange for email and calendaring. It’s not like Microsoft wants to play nice with open source. (Side note: I just went to microsoft.com to see if they spell it MicroSoft or Microsoft.) You get the idea.

Today I was testing some email changes and wanted multiple recipients in my To: field, so I added my Yahoo email. I never use my Yahoo email account any more, the reason is that it is stuffed with spam. Yahoo doesn’t care, my guess is they are like HotMail, you want to bulk advertise that’s fine, just pay the fee. It drove me to GMail.

But this took the cake. Today I went to mail.yahoo.com to see what I got for testing and it didn’t even default to my Inbox any more. Are you kidding me? They sent me to some news story first and I had click on InBox? Hello… Where was the UI design person on that one?

If GMail goes the way of Yahoo mail I will leave them too. Let this be a warning.

Categories: Musings Tags:

Cisco Contract True-up

September 14th, 2009 admin No comments

On Mondays my colleague and I discuss the big things that need to be done for the week. He was going to work on the network maps and I was going to work on reconciling our network inventory against what Cisco has for us on the support contract.

The funniest thing about this post was the first time I ran the comparison script, nothing matched up. I started looking for logic errors and couldn’t find any. Next I compared the serial numbers from Netdisco and Cisco, they didn’t match. So I logged into a router and a switch, the Netdisco serial numbers were correct, however, the Cisco serial numbers were not. I started to write an email to our Cisco contact and decided I had better call him because I didn’t want him blowing me off. He started looking at what he sent me and noticed that he sent me item numbers, not serial numbers. Then he told me the craziest thing, that I was the first person to complain to him that I was not given serial numbers and he had been using the same macros in his spreadsheet for a long time. Makes you stop and think considering they had quite a bit of inventory on our support contract that we don’t have.

First I got out one of my generic Netdisco queries and started looking at the tables in the Netdisco sql files. It seemed pretty straightforward, all the information I needed came from the devices table. I ran the query and messed around with the output format to get it to what I needed. Next I went to the WCS and hunted for an inventory report, which they had. So I ran the report and exported it to .csv format.

Here is the code for querying Netdisco to get inventory:

#!/usr/bin/perl

# query-netdisco.pl
# 2009-09-01 Judson Bishop
# Released under GPLv2
# Simple Netdisco query, output in .csv.
use DBI;

# Connect to DB
my $dbh = DBI->connect ( "dbi:ODBC:netdisco", "netdisco", "CHANGE_PASSWORD", {PrintError => 1})
or die "connection failed to database $DBI::errstr\n";

# Set up tracing
# This should only do something when you add prepare, execute,
# fetch and disconnect.
unlink '/tmp/trace.log' if -e '/tmp/trace.log';
DBI->;trace( 2, '/tmp/trace.log' );

# Prepare query
my $sql_st = $dbh->prepare( "SELECT name, mac, model, os_ver, location, vendor, ip, serial FROM device" );
$sql_st->;execute()
or die "Cannot execute SQL statement $DBI::errstr\n";

# Retrieve data
my @row;
while ( @row = $sql_st->fetchrow_array() )
{
    foreach (@row)
    {
        print $_ . ",";
    }
    print "\n";
}
warn "Data fetch terminated by error $DBI::errstr\n"
if $DBI::err;

# Disconnect
$dbh->disconnect or warn "disconnection failed $DBI::errstr\n";

exit();

Then the fun began. I had to figure out the common fields between Netdisco and the Cisco WCS inventory report. Here they are:

Netdisco WCS
name name
mac mac
model model
os_ver software version
location location
vendor controller
ip status
serial serial










So it’s time to massage the data:

./query-netdisco.pl | grep -i cisco >circus-inventory.csv
cat wcs-inventory.csv | cut -d, -f 1,2,3,4,5,6,7,12,13 >>circus-inventory.csv

After saving the Cisco spreadsheet into a .csv file it was off to the races to make a short script comparing what was in our inventory and what Cisco believes we have.

#!/usr/bin/perl -w

# match.pl
# 2009-09-01 Jud Bishop
# Released under GPLv2.
# This script compares the output of cisco-inventory.csv and circus-inventory.csv for inventory.


# This is the order of the columns from Netdisco and the WCS inventory report:
# Name,Ethernet MAC,Model,Software Version,Location,Controller Name,ip,Serial Number

# This is the order from the Cisco spreadsheet:
# Contract #,Contract Status,Contract Type,Item Beg Date,Item End Date,Item Name,Serial #,Item Status,Item Type,Last Date of Support,Install Site Id,Install Site Name,Install Site Addr1,Install Site Addr2,Install Site City,Install Site State/Province,Install Site Zipcode,Install Site Country,Bill-To Id,Billto Name

my %circus;
my %cisco;

# Read in the local inventory.
open (FILE,"circus-inventory.csv") or die "Error: can't open file\n $!";
while ()
{
    chomp;  
    my ($name,$mac,$model,$version,$location,$vendor,$ip,$serial) = split ',';
     $circus{$serial} = { 'name'=>$name, 'mac'=>$mac, 'model'=>$model, 'version'=>$version,                'location'=>$location, 'vendor'=>$vendor, 'ip'=>$ip, 'serial'=>$serial};
}
close FILE or die "Error: can't close file\n $!";

# Read in the cisco inventory.
open (FILE,"cisco-inventory.csv") or die "Error: can't open file\n $!";
while ()
{
    chomp;
    my ($con_num,$con_stat,$con_type,$start_date,$end_date,$name,$serial,$status,
    $type,$end_support,$site_id,$site_name,$addr1,$addr2,$city,$state,$zip,
    $country,$bill_id,$bill_name) = split ',';

    $cisco{$serial} = { 'con_num'=>$con_num, 'con_stat'=>$con_stat, 'con_type'=>$con_type,
                            'start_date'=>$start_date, 'end_date'=>$end_date,'name'=>$name, 'serial'=>$serial,
                            'status'=>$status, 'type'=>$type, 'end_support'=>$end_support, 'site_id'=>$site_id,
                            'site_name'=>$site_name, 'addr1'=>$addr1, 'addr2'=>$addr2, 'city'=>$city, 'state'=>$state,
                             'zip'=>$zip, 'country'=>$country, 'bill_id'=>$bill_id, 'bill_name'=>$bill_name };
}
close FILE or die "Error: can't close file\n $!";

# Compare what Cisco has in their list to our inventory.
for my $key (keys %cisco)
{
    if(exists $circus{$key})
    {
        # We both have this serial number, so we agree.
        print "both $circus{$key}->{name},$circus{$key}->{mac}, $circus{$key}->{model},$circus{$key}->{serial}, $circus{$key}->{version},$circus{$key}->{location}, $circus{$key}->{ip}\n";
        # We delete the key from circus to make sure that when we write
        # the list later we know which are not in the contract.
        delete($circus{$key});
        } else {
        # Cisco has it in their inventory but we don't.
        print "cisco only $cisco{$key}->{serial},$cisco{$key}->{name}\n";
    }
}

for my $key (keys %circus)
{
   # We have it on our inventory but they don't have it in theirs.
   # This is the follow up from the delete above.
    print "circus only $circus{$key}->{name},$circus{$key}->{mac}, $circus{$key}->{model},$circus{$key}->{serial}, $circus{$key}->{version}, $circus{$key}->{location},$circus{$key}->{ip}\n";
}

exit();

Now make a file for each instance of inventory. Then it was just a matter of time comparing the instances. Notice that I included the IP addresses in the Netdisco inventory. The first time I compared we didn’t have IP addresses and it was a pain checking the map for each one. I decided it was easier to rewrite the script so that we could easily find the devices.

./match.pl >true-up.txt
cat true-up.txt | grep cisco >cisco-only.txt
cat true-up.txt | grep circus >circus-only.txt
cat true-up.txt | grep both >both.txt
Categories: Code, Linux, Routing Tags:

DNS Check Zones Script

September 11th, 2009 admin Comments off

At the Circus there are a number of people that have access to the DNS servers and not everyone understands the full extent of the damage they can do when they make improper changes to the configuration or zone files. One time we had a serious outage because there was an error in a zone file and DNS was returning non-authoritative answers for our zones. As a result I wrote this check_zones script to check all of the zones and email me with the results each night.

For those who are learning shell scripting the interesting thing to notice about this script is that I am actually reading two variables at the end of the cat command. Sometimes I forget how to do this and this is one of my library scripts for this exact reason. Once again, you got it off the web, your mileage may vary.

#!/bin/bash

# 2007-02-14  Jud Bishop
# This script parses the /etc/named.conf file and checks every zone listed in it.
# Released under the GPL v2.

echo "" >/tmp/check_zone
echo "This is the list of bad zones." >>/tmp/check_zone

cat /etc/named.conf |egrep -w "zone|file" |cut -d \" -f 2 |sed '1~2 {N;s/\n/ /g}' |egrep -v "root|skip" |while read ZONE FILE
do
    #echo "zone $ZONE file $FILE"
    /usr/local/sbin/named-checkzone -k ignore $ZONE /var/named/$FILE
    if [ $? -ne 0 ]
    then
       echo "$ZONE BAD" >>/tmp/check_zone
    fi
done

echo "If there are no zones listed as BAD then there are no problems." >>/tmp/check_zone

cat /tmp/check_zone |mail -s "Zone Check" judson.bishop@circus.org
Categories: Code, Linux Tags:

CraigsList Crawler 3000

September 1st, 2009 jud 6 comments

Update 2010-08-26
I have made changes to the script below as a result of some requests. The output should be easier to read.

I should also point out how to find the categories. The usage example that is output when you run the script with no command line switches is only an example, the script will search any category under the “for sale” heading of CraigsList. For instance, under “for sale” is the category “antiques” and when I click on it the link is below.

http://atlanta.craigslist.org/atq/

The category is “atq” in the URL and that is what you would put to search the “antiques” category with this script. The same construct applies if you would like to search “appliances” or any other category.

 Usage: ./CLCrawler3000.pl category keyword
 Exmaple: ./CLCrawler3000.pl sys "mac+mini"
 Categories:
 sys == computers
 tls == tools
 bik == bike
 sad == system admin jobs

So if I wanted look for a Linux system administration job I would type in:

 ./CLCrawler3000.pl sad linux

And if I wanted an armoire in the antique category I would run the script with:

 ./CLCrawler3000.pl atq armoire

Original Post
The name of this script was given by one of my work mates, Scott, when he started using it to search CraigsList. I wrote this script when I became frustrated with the functionality of CraigsList. I live in a small town and I wanted to search for items on CraigsList, however, I would have to search the larger cities around me in order to find items I needed. It didn’t matter to me whether I went to Atlanta, Birmingham or Huntsville, I was still going to have to drive, and when you are looking for bikes on CraigsList you might as well search all of Colorado, California and Texas. The script just grew from there.

I will say that CraigsList has changed its’ output format a couple of times since I wrote this script. I also have had to make changes depending upon the category I was searching. Like all scripts on the internet, your mileage may vary but I hope you find this script as useful as I have.

I would also like to apologize for the code listing. I just used the simple code tag because more fancy highlighting did not look very good.

If you download the script and just run it from the command line, it will give you sample usage. It also outputs a file, clcrawler.html, which you can open in your web browser to view the results.

 Usage: ./CLCrawler3000.pl category keyword
 Exmaple: ./CLCrawler3000.pl sys "mac+mini"
 Categories:
 sys == computers
 tls == tools
 bik == bike
 sad == system admin jobs
#!/usr/bin/perl

use strict;
use LWP::Simple;
use HTML::TokeParser;

die " Usage: $0 category keyword\n Exmaple: $0 sys \"mac+mini\" \n Categories: \n sys == computers\n tls == tools\n bik == bike\n sad == system admin jobs\n " unless @ARGV;

# This is the category
my $cat = $ARGV[0] || "tls";

# This is the keyword you are looking for...
my $keyword =  $ARGV[1] || "surface+plate";

# This is the output file.
my $html = "clcrawler3000.html";

# Define the arrays for each state to be passed into craigslist search,
# by defining each state individually I can tailor my searches quicker.
my %states = (
    Alabama => [ qw(auburn bham columbusga huntsville mobile montgomery tuscaloosa) ],
    Florida => [ qw(daytona keys fortlauderdale fortmyers gainesville jacksonville lakeland miami ocala orlando pensacola sarasota spacecoast tallahassee tampa treasure westpalmbeach) ],
    Georgia => [ qw(atlanta columbusga athensga augusta macon savannah valdosta) ],
    Mississippi => [ qw(gulfport hattiesburg jackson northmiss) ],
    Kentucky => [ qw(bgky cincinnati huntington lexington louisville westky) ],
    SouthCarolina => [ qw(charleston columbia greenville hiltonhead myrtlebeach) ],
    Tennessee => [ qw(memphis chattanooga knoxville nashville tricities) ],
    Alaska => [ qw(anchorage) ],
    Arizona => [ qw(flagstaff phoenix prescott tucson yuma) ],
    Arkansas => [ qw(fayar fortsmith jonesboro littlerock memphis texarkana) ],
    California => [ qw(bakersfield chico fresno goldcountry humboldt inlandempire losangeles merced modesto monterey orangecounty palmsprings redding reno sacramento sandiego sfbay slo santabarbara stockton ventura visalia) ],
    Colorado => [ qw(boulder cosprings denver fortcollins pueblo rockies westslope)],
    Connecticut => [ qw(newlondon hartford newhaven nwct) ],
    Delaware => [ qw(delaware) ],
    DC => [ qw(washingtondc) ],
    Hawaii => [ qw(honolulu) ],
    Idaho => [ qw(boise eastidaho pullman spokane) ],
    Illinois => [ qw(bn carbondale chambana chicago peoria quadcities rockford springfield stlouis) ],
    Indiana => [ qw(bloomington evansville fortwayne indianapolis muncie southbend terrahaute tippecanoe chicago) ],
    Iowa => [ qw(ames cedarrapids desmoines dubuque iowacity omaha quadcities siouxcity) ],
    Kansas => [ qw(kansascity lawrence ksu topeka wichita) ],
    Louisiana => [ qw(batonrouge lafayette lakecharles neworleans shreveport) ],
    Maine => [ qw(maine) ],
    Maryland => [ qw(baltimore easternshore westmd) ],
    Massachusetts => [ qw(boston capecod southcoast westernmass worcester) ],
    Michigan => [ qw(annarbor centralmich detroit flint grandrapids jxn kalamazoo lansing nmi saginaw southbend up) ],
    Minnesota => [ qw(duluth fargo mankato minneapolis rmn stcloud) ],
    Missouri => [ qw(columbiamo joplin kansascity springfield stlouis) ],
    Montana => [ qw(montana) ],
    Nebraska => [ qw(grandisland lincoln omaha siouxcity) ],
    Nevada => [ qw(lasvegas reno)],
    NewHampshire => [ qw(nh) ],
    NewJersey => [ qw(cnj newjersey southjersey) ],
    NewMexico => [ qw(albuquerque lascruces roswell santafe) ],
    NewYork => [ qw(albany binghamton buffalo catskills chautauqua elmira hudsonvalley ithaca longisland newyork plattsburgh rochester syracuse utica watertown) ],
    NorthCarolina => [ qw(asheville boone charlotte eastnc fayetteville greensboro outerbanks raleigh wilmington winstonsalem) ],
    NorthDakota => [ qw(fargo nd) ],
    Ohio => [ qw(akroncanton athensohio cincinnati cleveland columbus dayton huntington limaohio mansfield parkersburg toledo wheeling youngstown) ],
    Oklahoma => [ qw(fortsmith lawton oklahomacity stillwater tulsa) ],
    Oregon => [ qw(bend corvallis eastoregon eugene medford oregoncoast portland salem) ],
    Pennsylvania => [ qw(altoona erie harrisburg lancaster allentown philadelphia pittsburgh poconos reading scranton pennstate york) ],
    RhodeIsland => [ qw(providence) ],
    SouthDakota => [ qw(sd) ],
    Texas => [ qw(dallas houston sanantonio austin beaumont brownsville) ],
    Utah => [ qw(logan ogden provo saltlakecity stgeorge) ],
    Vermont => [ qw(burlington) ],
    Virginia => [ qw(blacksburg charlottesville danville norfolk harrisonburg lynchburg richmond roanoke) ],
    Washington => [ qw(bellingham kpr pullman seattle spokane wenatchee yakima) ],
    WestVirginia => [ qw(charlestonwv huntington martinsburg morgantown parkersburg wheeling) ],
    Wisconsin => [ qw(appleton duluth eauclaire greenbay lacrosse madison milwaukee) ],
    Wyoming => [ qw(wyoming) ],
);

sub get_craigs {

    my $city = shift;

    # Download the page using get();.
    # my $content = get( "http://$city.craigslist.org/search/tls?query=$keyword" ) or die $!;
    print "city == $city\n";
    print "keyword == $keyword\n";
    print "category == $cat\n";
    print "http://$city.craigslist.org/search/$cat?query=$keyword \n";

    my $content = get( "http://$city.craigslist.org/search/$cat?query=$keyword" ) or die $!;

    # Split up the page blob into lines so that we can manipulate them.
    my @lines = split(/\n/, $content);

    foreach my $i (0 .. @lines)
    {
        # This is the key to the whole program, the returned listings are in rows
        # This is the item listing.
        # I tested this on bikes.
#                <p class="row">
#                        <span class="ih" id="images:3n63o53l45O25V35W4a8q669e2752037a111f.jpg">&nbsp;</span>
#                         Aug 26 - <a href="http://auburn.craigslist.org/bik/1920996795.html">Gary Fisher Mountain Bike  -</a>
#                         $950<font size="-1"> (Auburn, AL)</font> <span class="p"> pic</span><br class="c">
#                </p>
        if ((@lines[$i] =~ /href/) && (@lines[$i] =~ /$city/))
        {
            print "line == @lines[$i]\n";
            my $line = @lines[$i];
            print HTML "$line<br>\n";
        }
    }


}

#------------------------------------------------------------------------------
# This didn't really have to be a subroutine, just cleaning things up and making
# them modular.  Open the file.
#------------------------------------------------------------------------------
sub open_html_file {
        open (HTML,">$html")
        or die "Error: cant't open $html \n $!";
}

#------------------------------------------------------------------------------
# Close the file.
#------------------------------------------------------------------------------
sub close_html_file {
        close HTML or die "Error: can't close $html\n $!";
}


#------------------------------------------------------------------------------
# Main.
#------------------------------------------------------------------------------

open_html_file();

# Make html the header
print HTML "<html>\n <head>\n <titel>CraigsList Crawler 3000</title>\n </head>\n <body>\n <br>\n\n" ;

# Iterate through the hash of arrays
foreach my $key ( keys %states )
{
    print HTML "<br>$key<br>\n";
    foreach my $i ( 0 .. $#{ $states{$key} } )
    {
        print HTML"<br>$states{$key}[$i]<br>\n";
        get_craigs($states{$key}[$i]);
        sleep(5);
    }
        print "\n";
}


print HTML " </body>\n\n" ;
close_html_file();
Categories: Code, Linux Tags: